Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python formatting, and gitignore additions. #326

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
1 participant
@ruebot
Copy link
Member

commented Jul 8, 2019

What does this Pull Request do?

Follow-on to 7a61f0e

  • Run black and isort on Python files.
  • Move Spark config to example file.
  • Update gitignore

How should this be tested?

I tested locally, and it was good to go. @ianmilligan1 if you want to test on your end, grab a small WARC (990/8471 is perfect!), then:

  1. Make sure you have your Python environment setup:
  • conda install pyspark
  • conda install tensorflow
  • conda install pyarrow
  1. Export your Python setup (for example):
  • export PYSPARK_PYTHON=/home/ruestn/anaconda3/bin/python
  • export PYSPARK_DRIVER_PYTHON=/home/ruestn/anaconda3/bin/python
  1. Build the branch locally

  2. Pull down the models:

  • cd /tmp && wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz
  • tar -xzvf ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz
  • mkdir -p /PATH/TO/aut/src/main/python/tf/model/graph/ssd_mobilenet_v1_fpn_640x640/
  • cp /tmp/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03/frozen_inference_graph.pb /PATH/TO/aut/src/main/python/tf/model/graph/ssd_mobilenet_v1_fpn_640x640/
  • mkdir -p /PATH/TO/aut/src/main/python/tf/model/category/
  • cd /PATH/TO/aut/src/main/python/tf/model/category/
  • wget https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/data/mscoco_label_map.pbtxt
  1. Tweak Spark conf:
  • cp /PATH/TO/aut/src/main/python/tf/util/spark.conf.example /PATH/TO/aut/src/main/python/tf/util/spark.conf
spark.sql.execution.arrow.enabled true
spark.sql.execution.arrow.maxRecordsPerBatch 50
spark.executor.memory 4G
spark.cores.max 4
spark.executor.cores 4
spark.driver.memory 4G
spark.task.cpus 2
  1. Start up Spark master/slave:
  • /PATH/TO/SPARK/sbin/start-master.sh
  • /PATH/TO/SPARK/sbin/start-slave.sh 127.0.1.1:7077
  1. Run the first step (for example):
  • python /PATH/TO/aut/src/main/python/tf/detect.py --web_archive "/home/nruest/tmp/auk/990/8471/warcs/*" --aut_jar /home/nruest/Projects/au/aut/target/aut-0.17.1-SNAPSHOT-fatjar.jar --spark /home/nruest/bin/spark-2.4.1-bin-hadoop2.7/bin --master spark://127.0.1.1:7077 --img_model ssd --filter_size 50 50 --output_path /home/nruest/Projects/au/sample-data/aut-image-tf-testing-03
  1. Run the second step (for example):
  • python /PATH/TO/src/main/python/tf/extract_images.py --res_dir /home/nruest/Projects/au/sample-data/aut-image-tf-testing-03 --output_dir /home/nruest/Projects/au/sample-data/aut-image-tf-testing-image-output-03 --threshold 0.85
  1. Check out the directory you dumped the images to!
Python formatting, and gitignore additions.
- Run black and isort on Python files.
- Move Spark config to example file.
- Update gitignore for 7a61f0e
additions.

@ruebot ruebot requested a review from ianmilligan1 Jul 8, 2019

@ruebot

This comment has been minimized.

Copy link
Member Author

commented Jul 8, 2019

@ianmilligan1 I have all these steps save locally, so we can use them for documentation when the time comes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.