Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add graphml output to CommandLineApp and DomainGraphExtractor. #438

Open
wants to merge 9 commits into
base: master
from

Conversation

@ruebot
Copy link
Member

ruebot commented Apr 9, 2020

GitHub issue(s): #435

What does this Pull Request do?

Add graphml output to CommandLineApp and DomainGraphExtractor.

  • Resolves #435
  • Adds GRAPHML option to CommandLineApp
  • Add DataFrame method to DomainGraphExtractor
  • Updates CommandLineApp test

How should this be tested?

TravisCI +

bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.50.1-SNAPSHOT-fatjar.jar --extractor DomainGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/app-output/graphml-rdd --output-format GRAPHML
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.50.1-SNAPSHOT-fatjar.jar --extractor DomainGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/app-output/graphml-df --df --output-format GRAPHML

Those both should have the same line count.

For example:

$ wc -l graphml-df/GRAPHML.xml graphml-rdd/GRAPHML.xml
  33911 graphml-df/GRAPHML.xml
  33911 graphml-rdd/GRAPHML.xml
  67822 total

Additional Notes:

I should add a test as well for WriteGraphmlDF similar to what we have for WriteGexf. So, don't merge this right away, but feel free to test it and make sure it works.

This might also take care of the remainder of #223 and supersede #397? 👋 @SinghGursimran

ruebot added 9 commits Feb 10, 2020
@ruebot ruebot requested review from lintool and ianmilligan1 Apr 9, 2020
@codecov

This comment has been minimized.

Copy link

codecov bot commented Apr 9, 2020

Codecov Report

Merging #438 into master will increase coverage by 0.13%.
The diff coverage is 86.66%.

@@            Coverage Diff             @@
##           master     #438      +/-   ##
==========================================
+ Coverage   78.04%   78.18%   +0.13%     
==========================================
  Files          43       43              
  Lines        1558     1586      +28     
  Branches      286      289       +3     
==========================================
+ Hits         1216     1240      +24     
- Misses        217      218       +1     
- Partials      125      128       +3     
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

1 participant
You can’t perform that action at this time.