Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upadd image analysis w/ tensorflow #318
Conversation
This comment has been minimized.
This comment has been minimized.
codecov-io
commented
Apr 25, 2019
•
Codecov Report
@@ Coverage Diff @@
## master #318 +/- ##
=======================================
Coverage 75.95% 75.95%
=======================================
Files 41 41
Lines 1148 1148
Branches 200 200
=======================================
Hits 872 872
Misses 209 209
Partials 67 67 Continue to review full report at Codecov.
|
This comment has been minimized.
This comment has been minimized.
@h324yang thanks for getting this started. Can you update your PR to use the PR template? That'll help us flesh out documentation that we'll need to run examples, and then write it all up here. Also, I'm not seeing any tests. Can you provide some? @lintool do you want #241 open still? Does this supersede it? |
This comment has been minimized.
This comment has been minimized.
...and is this apart of everything that should be included, or just helpers for the work you did on the paper? |
This comment has been minimized.
This comment has been minimized.
Distributed image analysis via the integration of AUT and TensorflowWhat does this Pull Request do?
How should this be tested?Step 1: Run detection
Step 2: Extract Images
Additional Notes:Python DependencyMy python environment is as listed in here. Though it's not the minimal requirement, to quickly set up, you can directly download it and then Note that you should ensure that driver and workers use the same python version. You might set as follows:
Spark ModeThe default mode is
The spark parameters are set by using Design Details
Interested parties |
This comment has been minimized.
This comment has been minimized.
@h324yang can you remove the binaries from the PR, provide code comments and instructions in PR testing comment on where to locate them, download them, and place them? |
ruebot
requested changes
May 31, 2019
src/main/python/tf/util/init.py Outdated
src/main/python/tf/util/init.py Outdated
src/main/python/tf/util/init.py Outdated
src/main/python/tf/util/init.py Outdated
src/main/python/tf/util/init.py Outdated
src/main/python/tf/extract_images.py Outdated
src/main/python/tf/extract_images.py Outdated
src/main/python/tf/extract_images.py Outdated
src/main/python/tf/extract_images.py Outdated
src/main/python/tf/extract_images.py Outdated
This comment has been minimized.
This comment has been minimized.
@h324yang I'm unable to get this to run.
I get:
|
This comment has been minimized.
This comment has been minimized.
Chatting with Leo in Slack; guess who did a I was giving a path to Python, |
This comment has been minimized.
This comment has been minimized.
First pass worked with some tweaks; changed We should definitely figure out a way to pass the Spark conf settings, since a user will definitely need to tweak them depending on their setup. I don't think we should have the conf settings hard coded in With What do you think @h324yang @lintool @ianmilligan1? |
This comment has been minimized.
This comment has been minimized.
All of the options sound good to me for various reasons! But I think at this stage as a prototype function we could probably just have people add some flags and roll with it – down the line, perhaps as a separate issue, come up with a |
This comment has been minimized.
This comment has been minimized.
We might want to address this message from when we run the initial pass too:
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Seems like an OOM error; The arguments I set in util/init.py were optimized and running well on Tuna. I got some errors but I don't think OOM is a frequent one. You also run on Tuna? Maybe a lower value of "spark.sql.execution.arrow.maxRecordsPerBatch" could help, e.g., 1280 -> 640. (Indeed, tuning such settings bothered me a lot :-/) |
This comment has been minimized.
This comment has been minimized.
@h324yang I ended up dropping it down to 320, and doing 10 WARCs instead of the previous attempts of doing 1000, and 100. It was a lot more stable with 10, and the initial job completed successfully. |
This comment has been minimized.
This comment has been minimized.
I update to the TF 1.14.0 api, i.e. |
h324yang
added some commits
Jun 29, 2019
This comment has been minimized.
This comment has been minimized.
@ruebot I done all requested changes except for |
ruebot
approved these changes
Jul 3, 2019
ruebot
requested changes
Jul 3, 2019
This comment has been minimized.
This comment has been minimized.
Sorry! That slipped my mind, and I already removed it. We can download it and mv the For example:
Then, we need the category mapping file For example:
|
h324yang commentedApr 25, 2019
JCDL2019 demo
Using AUT and SSD model w/ Tensorflow to do object detection analysis on web archives.
standalone
mode, so need to set up master and slaves first.detect.py
to get and store the object probabilities and the image byte strings.extract_images.py
to get image files from the result ofstep2