Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upAlign RDD and DF output for DomainGraphExtractor. #437
+18
−12
Conversation
This comment has been minimized.
This comment has been minimized.
codecov
bot
commented
Apr 8, 2020
Codecov Report
@@ Coverage Diff @@
## master #437 +/- ##
==========================================
+ Coverage 77.99% 78.04% +0.05%
==========================================
Files 43 43
Lines 1554 1558 +4
Branches 286 286
==========================================
+ Hits 1212 1216 +4
Misses 217 217
Partials 125 125 |
Looks good - tried it out. One proviso - the output of this has node IDs like:
Whereas if we were to run a script like this one in aut-docs, we get:
The behaviour of |
This comment has been minimized.
This comment has been minimized.
@ianmilligan1 can you open up an issue for that? That's a good catch. Those should all be aligned. |
This comment has been minimized.
This comment has been minimized.
@ruebot Will do tomorrow! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
ruebot commentedApr 8, 2020
GitHub issue(s): #436
What does this Pull Request do?
How should this be tested?
TravisCI + Some version of this:
The output of these two file should have the same line count:
Additional Notes
This should unblock #435.
It's also worth noting this:
https://github.com/archivesunleashed/aut/blob/issue-436/src/main/scala/io/archivesunleashed/app/DomainGraphExtractor.scala#L42 vs https://github.com/archivesunleashed/aut/blob/issue-436/src/main/scala/io/archivesunleashed/package.scala#L184
They should be doing the same thing, but on the DataFrame side, we still get empty
src
ordest
values. That's why https://github.com/archivesunleashed/aut/blob/issue-436/src/main/scala/io/archivesunleashed/app/DomainGraphExtractor.scala#L62-L63 is there.