Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upRemove RDD option in app; DataFrame only now. #450
Merged
+52
−419
Conversation
This comment has been minimized.
This comment has been minimized.
codecov
bot
commented
Apr 20, 2020
•
Codecov Report
@@ Coverage Diff @@
## master #450 +/- ##
==========================================
- Coverage 75.55% 74.55% -1.01%
==========================================
Files 40 40
Lines 1395 1285 -110
Branches 265 246 -19
==========================================
- Hits 1054 958 -96
+ Misses 218 211 -7
+ Partials 123 116 -7 |
ruebot
added a commit
to archivesunleashed/aut-docs
that referenced
this pull request
Apr 20, 2020
This comment has been minimized.
This comment has been minimized.
Documentation PR: archivesunleashed/aut-docs#56 |
Tested with both TravisCI and the provided code snippets - they all work very well. Great stuff! |
ianmilligan1
pushed a commit
to archivesunleashed/aut-docs
that referenced
this pull request
Apr 20, 2020
#56) * Documentation updates for archivesunleashed/aut#450
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
ruebot commentedApr 20, 2020
GitHub issue(s): #449
What does this Pull Request do?
Remove RDD option in app; DataFrame only now.
I'll get an associated documentation PR with this as well.
How should this be tested?
If you want to robust, the following:
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor DomainGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/DomainGraphExtractorGEXF --output-format GEXF
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor DomainGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/DomainGraphExtractorTEXT
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor DomainFrequencyExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/DomainFrequencyExtractor
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor ImageGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/ImageGraphExtractor
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor PlainTextExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/PlainTextExtractor
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor WebPagesExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/WebPagesExtractor
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor DomainFrequencyExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/DomainFrequencyExtractorSingle --partition 1
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor ImageGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/ImageGraphExtractorSingle --partition 1
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor PlainTextExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/PlainTextExtractorSingle --partition 1
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.60.1-SNAPSHOT-fatjar.jar --extractor WebPagesExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/449-test/WebPagesExtractorSingle --partition 1
Should produce the following: