Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add graphml output to CommandLineApp and DomainGraphExtractor. #438

Merged
merged 10 commits into from Apr 11, 2020

Conversation

@ruebot
Copy link
Member

ruebot commented Apr 9, 2020

GitHub issue(s): #435

What does this Pull Request do?

Add graphml output to CommandLineApp and DomainGraphExtractor.

  • Resolves #435
  • Adds GRAPHML option to CommandLineApp
  • Add DataFrame method to DomainGraphExtractor
  • Updates CommandLineApp test

How should this be tested?

TravisCI +

bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.50.1-SNAPSHOT-fatjar.jar --extractor DomainGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/app-output/graphml-rdd --output-format GRAPHML
bin/spark-submit --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.50.1-SNAPSHOT-fatjar.jar --extractor DomainGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/app-output/graphml-df --df --output-format GRAPHML

Those both should have the same line count.

For example:

$ wc -l graphml-df/GRAPHML.xml graphml-rdd/GRAPHML.xml
  33911 graphml-df/GRAPHML.xml
  33911 graphml-rdd/GRAPHML.xml
  67822 total

Additional Notes:

I should add a test as well for WriteGraphmlDF similar to what we have for WriteGexf. So, don't merge this right away, but feel free to test it and make sure it works.

This might also take care of the remainder of #223 and supersede #397? 👋 @SinghGursimran

ruebot added 9 commits Feb 10, 2020
@ruebot ruebot requested review from lintool and ianmilligan1 Apr 9, 2020
@codecov

This comment has been minimized.

Copy link

codecov bot commented Apr 9, 2020

Codecov Report

Merging #438 into master will increase coverage by 0.13%.
The diff coverage is 87.50%.

@@            Coverage Diff             @@
##           master     #438      +/-   ##
==========================================
+ Coverage   78.04%   78.18%   +0.13%     
==========================================
  Files          43       43              
  Lines        1558     1586      +28     
  Branches      286      289       +3     
==========================================
+ Hits         1216     1240      +24     
- Misses        217      218       +1     
- Partials      125      128       +3     
Copy link
Member

ianmilligan1 left a comment

See 🤦 comment from me below (but the file outputs are great!).

Copy link
Member

ianmilligan1 left a comment

👍 Looks great on all fronts! (give me a thumbs up if you're happy to have this PR merged, @ruebot, looks good to me)

@ruebot

This comment has been minimized.

Copy link
Member Author

ruebot commented Apr 11, 2020

@ianmilligan1 sure, let's use the following as a commit message:

Add graphml output to CommandLineApp and DomainGraphExtractor.

  • Resolves #435
  • Adds GRAPHML option to CommandLineApp
  • Adds DataFrame method to DomainGraphExtractor
  • Updates CommandLineApp, and WriteGraphML tests
@ianmilligan1 ianmilligan1 merged commit 943cf92 into master Apr 11, 2020
3 checks passed
3 checks passed
codecov/patch 87.50% of diff hit (target 78.04%)
Details
codecov/project 78.18% (+0.13%) compared to 96899f4
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@ianmilligan1 ianmilligan1 deleted the issue-435 branch Apr 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants
You can’t perform that action at this time.