Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upadd image analysis w/ tensorflow #318
Conversation
This comment has been minimized.
This comment has been minimized.
codecov-io
commented
Apr 25, 2019
•
Codecov Report
@@ Coverage Diff @@
## master #318 +/- ##
=======================================
Coverage 75.95% 75.95%
=======================================
Files 41 41
Lines 1148 1148
Branches 200 200
=======================================
Hits 872 872
Misses 209 209
Partials 67 67 Continue to review full report at Codecov.
|
This comment has been minimized.
This comment has been minimized.
@h324yang thanks for getting this started. Can you update your PR to use the PR template? That'll help us flesh out documentation that we'll need to run examples, and then write it all up here. Also, I'm not seeing any tests. Can you provide some? @lintool do you want #241 open still? Does this supersede it? |
This comment has been minimized.
This comment has been minimized.
...and is this apart of everything that should be included, or just helpers for the work you did on the paper? |
This comment has been minimized.
This comment has been minimized.
Distributed image analysis via the integration of AUT and TensorflowWhat does this Pull Request do?
How should this be tested?Step 1: Run detection
Step 2: Extract Images
Additional Notes:Python DependencyMy python environment is as listed in here. Though it's not the minimal requirement, to quickly set up, you can directly download it and then Note that you should ensure that driver and workers use the same python version. You might set as follows:
Spark ModeThe default mode is
The spark parameters are set by using Design Details
Interested parties |
This comment has been minimized.
This comment has been minimized.
@h324yang can you remove the binaries from the PR, provide code comments and instructions in PR testing comment on where to locate them, download them, and place them? |
ruebot
requested changes
May 31, 2019
parser.add_argument('--web_archive', help='input directory for web archive data', default='/tuna1/scratch/nruest/geocites/warcs') | ||
parser.add_argument('--aut_jar', help='aut compiled jar package', default='aut/target/aut-0.17.1-SNAPSHOT-fatjar.jar') | ||
parser.add_argument('--aut_py', help='path to python package', default='aut/src/main/python') | ||
parser.add_argument('--spark', help='path to python package', default='spark-2.3.2-bin-hadoop2.7/bin') |
This comment has been minimized.
This comment has been minimized.
parser = argparse.ArgumentParser(description='PySpark for Web Archive Image Retrieval') | ||
parser.add_argument('--web_archive', help='input directory for web archive data', default='/tuna1/scratch/nruest/geocites/warcs') | ||
parser.add_argument('--aut_jar', help='aut compiled jar package', default='aut/target/aut-0.17.1-SNAPSHOT-fatjar.jar') | ||
parser.add_argument('--aut_py', help='path to python package', default='aut/src/main/python') |
This comment has been minimized.
This comment has been minimized.
ruebot
May 31, 2019
Member
Is this supposed to be the Python binary? Or something else? Needs a better help description.
def get_args(): | ||
parser = argparse.ArgumentParser(description='PySpark for Web Archive Image Retrieval') | ||
parser.add_argument('--web_archive', help='input directory for web archive data', default='/tuna1/scratch/nruest/geocites/warcs') | ||
parser.add_argument('--aut_jar', help='aut compiled jar package', default='aut/target/aut-0.17.1-SNAPSHOT-fatjar.jar') |
This comment has been minimized.
This comment has been minimized.
|
||
def get_args(): | ||
parser = argparse.ArgumentParser(description='PySpark for Web Archive Image Retrieval') | ||
parser.add_argument('--web_archive', help='input directory for web archive data', default='/tuna1/scratch/nruest/geocites/warcs') |
This comment has been minimized.
This comment has been minimized.
parser.add_argument('--aut_py', help='path to python package', default='aut/src/main/python') | ||
parser.add_argument('--spark', help='path to python package', default='spark-2.3.2-bin-hadoop2.7/bin') | ||
parser.add_argument('--master', help='master IP address', default='spark://127.0.1.1:7077') | ||
parser.add_argument('--img_model', help='model for image processing, use ssd', default='ssd') |
This comment has been minimized.
This comment has been minimized.
ruebot
May 31, 2019
Member
model for image processing, use ssd
If this is the only option, why is there an argument?
extractor = SSDExtractor(args.res_dir, args.output_dir) | ||
extractor.extract_and_save(class_ids="all", threshold=args.threshold) | ||
|
||
|
This comment has been minimized.
This comment has been minimized.
|
||
|
||
def get_args(): | ||
parser = argparse.ArgumentParser(description='Extracting images from model output') |
This comment has been minimized.
This comment has been minimized.
|
||
def get_args(): | ||
parser = argparse.ArgumentParser(description='Extracting images from model output') | ||
parser.add_argument('--res_dir', help='result (model output) dir') |
This comment has been minimized.
This comment has been minimized.
def get_args(): | ||
parser = argparse.ArgumentParser(description='Extracting images from model output') | ||
parser.add_argument('--res_dir', help='result (model output) dir') | ||
parser.add_argument('--output_dir', help='extracted image file output dir') |
This comment has been minimized.
This comment has been minimized.
parser = argparse.ArgumentParser(description='Extracting images from model output') | ||
parser.add_argument('--res_dir', help='result (model output) dir') | ||
parser.add_argument('--output_dir', help='extracted image file output dir') | ||
parser.add_argument('--threshold', type=float, help='threshold of detection confidence scores') |
h324yang commentedApr 25, 2019
JCDL2019 demo
Using AUT and SSD model w/ Tensorflow to do object detection analysis on web archives.
standalone
mode, so need to set up master and slaves first.detect.py
to get and store the object probabilities and the image byte strings.extract_images.py
to get image files from the result ofstep2