Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upMake ArchiveRecord.getContentBytes consistent, Resolve #334 #335
+4
−4
Conversation
ianmilligan1
requested a review
from ruebot
Jul 30, 2019
This comment has been minimized.
This comment has been minimized.
codecov-io
commented
Jul 30, 2019
•
Codecov Report
@@ Coverage Diff @@
## master #335 +/- ##
========================================
- Coverage 75.97% 75% -0.98%
========================================
Files 39 39
Lines 1124 1124
Branches 197 197
========================================
- Hits 854 843 -11
- Misses 205 214 +9
- Partials 65 67 +2
Continue to review full report at Codecov.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
ianmilligan1 commentedJul 30, 2019
GitHub issue(s):
#334
What does this Pull Request do?
As noted in #334, @jrwiebe discovered that we are inconsistent on how we
getContentBytes
of ARC and WARC files. Currently, ARC files we just get the contents of the record (usinggetBodyContent
) and WARC files we get everything (usinggetContent
). The two should be consistent.On the ticket itself, I discussed the various outputs, how they differ, and potential solutions. I think it is critical that we normalize the two approaches. Given the existence of
RemoveHttpHeader
, I think we should have both just usinggetContent
. If people just want the body content, they can remove the header (which is what we've been doing with WARC files). I can imagine lots of diverse use cases for HTTP header information so it's better to have it in there and then remove it.How should this be tested?
Additional Notes:
We should probably foreground
RemoveHttpHeader
more in our documentation.Interested parties
@ruebot @jrwiebe