Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upDomainGraphExtractor produces different output in RDD vs DF #436
Labels
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ruebot commentedApr 8, 2020
To Reproduce
Steps to reproduce the behavior (e.g.):
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.50.1-SNAPSHOT-fatjar.jar --extractor DomainGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/app-output/DomainGraphText --output-format TEXT
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.50.1-SNAPSHOT-fatjar.jar --extractor DomainGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/app-output/DomainGraphText --df --output-format TEXT
cat
the part files together for each.Expected behavior
The files should be the same.
Environment information
Additional context
Blocks #435