Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python formatting, and gitignore additions. #326

Merged
merged 4 commits into from Jul 18, 2019
Merged

Python formatting, and gitignore additions. #326

merged 4 commits into from Jul 18, 2019

Conversation

@ruebot
Copy link
Member

ruebot commented Jul 8, 2019

What does this Pull Request do?

Follow-on to 7a61f0e

  • Run black and isort on Python files.
  • Move Spark config to example file.
  • Update gitignore

How should this be tested?

I tested locally, and it was good to go. @ianmilligan1 if you want to test on your end, grab a small WARC (990/8471 is perfect!), then:

  1. Make sure you have your Python environment setup:
  • conda install pyspark
  • conda install tensorflow
  • conda install pyarrow
  1. Export your Python setup (for example):
  • export PYSPARK_PYTHON=/home/ruestn/anaconda3/bin/python
  • export PYSPARK_DRIVER_PYTHON=/home/ruestn/anaconda3/bin/python
  1. Build the branch locally

  2. Pull down the models:

  • cd /tmp && wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz
  • tar -xzvf ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz
  • mkdir -p /PATH/TO/aut/src/main/python/tf/model/graph/ssd_mobilenet_v1_fpn_640x640/
  • cp /tmp/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03/frozen_inference_graph.pb /PATH/TO/aut/src/main/python/tf/model/graph/ssd_mobilenet_v1_fpn_640x640/
  • mkdir -p /PATH/TO/aut/src/main/python/tf/model/category/
  • cd /PATH/TO/aut/src/main/python/tf/model/category/
  • wget https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/data/mscoco_label_map.pbtxt
  1. Tweak Spark conf:
  • cp /PATH/TO/aut/src/main/python/tf/util/spark.conf.example /PATH/TO/aut/src/main/python/tf/util/spark.conf
spark.sql.execution.arrow.enabled true
spark.sql.execution.arrow.maxRecordsPerBatch 50
spark.executor.memory 4G
spark.cores.max 4
spark.executor.cores 4
spark.driver.memory 4G
spark.task.cpus 2
  1. Start up Spark master/slave:
  • /PATH/TO/SPARK/sbin/start-master.sh
  • /PATH/TO/SPARK/sbin/start-slave.sh 127.0.1.1:7077
  1. Run the first step (for example):
  • python /PATH/TO/aut/src/main/python/tf/detect.py --web_archive "/home/nruest/tmp/auk/990/8471/warcs/*" --aut_jar /home/nruest/Projects/au/aut/target/aut-0.17.1-SNAPSHOT-fatjar.jar --spark /home/nruest/bin/spark-2.4.1-bin-hadoop2.7/bin --master spark://127.0.1.1:7077 --img_model ssd --filter_size 50 50 --output_path /home/nruest/Projects/au/sample-data/aut-image-tf-testing-03
  1. Run the second step (for example):
  • python /PATH/TO/src/main/python/tf/extract_images.py --res_dir /home/nruest/Projects/au/sample-data/aut-image-tf-testing-03 --output_dir /home/nruest/Projects/au/sample-data/aut-image-tf-testing-image-output-03 --threshold 0.85
  1. Check out the directory you dumped the images to!
- Run black and isort on Python files.
- Move Spark config to example file.
- Update gitignore for 7a61f0e
additions.
@ruebot ruebot requested a review from ianmilligan1 Jul 8, 2019
@ruebot

This comment has been minimized.

Copy link
Member Author

ruebot commented Jul 8, 2019

@ianmilligan1 I have all these steps save locally, so we can use them for documentation when the time comes

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jul 17, 2019

Codecov Report

Merging #326 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #326   +/-   ##
=======================================
  Coverage   74.97%   74.97%           
=======================================
  Files          39       39           
  Lines        1123     1123           
  Branches      197      197           
=======================================
  Hits          842      842           
  Misses        215      215           
  Partials       66       66

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f35d54e...78ef407. Read the comment docs.

ruebot added 2 commits Jul 17, 2019
Copy link
Member

ianmilligan1 left a comment

Woohoo. Very great stuff. Lots of politician faces in this sample web archive of Canadian political parties:

Screen Shot 2019-07-18 at 1 56 48 PM

Apologies for the delay on this – a few dozen Slack messages and wrangling and this was successfully built in a conda virtual environment (this guide was useful for future reference.

For documentation purposes, on MacOS, the default URL for Spark master was formatted as spark://Ians-MacBook-Pro-3.local:7077, the 127.0.1.1:7077 didn't work on my end. Also, the Python version that ultimately worked was 3.7.1.

@ruebot

This comment has been minimized.

Copy link
Member Author

ruebot commented Jul 18, 2019

Oh, that's good to know about the mac side of things.

@ianmilligan1 ianmilligan1 merged commit bd5ef14 into master Jul 18, 2019
3 checks passed
3 checks passed
codecov/patch Coverage not affected when comparing f35d54e...78ef407
Details
codecov/project 74.97% remains the same compared to f35d54e
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@ianmilligan1 ianmilligan1 deleted the tf-follow-on branch Jul 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.