Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upDescribe how to start PySpark console in Docker container #22
Conversation
@sepastian looks like you need to update you local fork. It appears that you have your previous commit in there from the PR earlier today. It's super helpful if you do all this on a branch other than master too. |
You mean using a feature branch? Which name? I edited the README on Github, not sure why it included the other PR again |
Overall, solid section. Need a few updates before I can merged. Thanks! |
$ docker run -it --rm archivesunleashed/docker-aut \ | ||
/spark/bin/pyspark \ | ||
--py-files /aut/target/aut.zip \ | ||
--packages "io.archivesunleashed:aut:0.70" # Download Java/Scala packages from maven central |
This comment has been minimized.
This comment has been minimized.
ruebot
May 29, 2020
Member
This needs to be updated for the master branch, since it builds on the master branch from aut
. Or, this section should be moved to the 0.70.0 branch.
It is also possible to start an interactive PySpark console. This requires specifying Python bindings and Java/Scala packages, both of which are included in the Docker image under `/aut/target`. | ||
```bash | ||
$ docker run -it --rm archivesunleashed/docker-aut \ |
This comment has been minimized.
This comment has been minimized.
>>> | ||
``` | ||
The example above loads version `0.70.1` of the Java/Scala packages. Your build may have packages in another version, to see what is available and select the right files, run the following. |
This comment has been minimized.
This comment has been minimized.
ruebot
May 29, 2020
Member
There is no 0.70.1 release. That's a snapshot from master on the aut
repo.
--packages "io.archivesunleashed:aut:0.70" # Download Java/Scala packages from maven central | ||
``` | ||
See also https://github.com/archivesunleashed/aut#archives-unleashed-toolkit-with-pyspark. |
This comment has been minimized.
This comment has been minimized.
ruebot
May 29, 2020
Member
I'd change this to:
For more information, see the Archives Unleashed Toolkit with PySpark of the Toolkit README.
Looking good overall. This will be really great to have in here!
|
@sepastian I'll pull this down locally, clean it up, and get it merged in. Don't worry about doing updates. |
be7f5b1
into
archivesunleashed:master
@sepastian all updated! Checkout:
I'll pull in a version of this into the next release branch for docker-aut. Free free to create an issue or PR with anymore updates. These contributions are great, and super helpful! |
sepastian commentedMay 29, 2020
No description provided.