Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upUpdate ArcRecordUtils for better error handling #258
Comments
ianmilligan1
added
the
bug
label
Aug 12, 2018
ianmilligan1
assigned
borislin
Aug 12, 2018
ruebot
added this to In review
in DataFrames and PySpark
Aug 13, 2018
ruebot
removed this from In review
in DataFrames and PySpark
Aug 13, 2018
ruebot
added this to To Do
in 1.0.0 Release of AUT
Aug 13, 2018
ruebot
moved this from To Do
to In Progress
in 1.0.0 Release of AUT
Aug 13, 2018
ruebot
added
duplicate
in progress
labels
Aug 20, 2018
ruebot
closed this
in
b8e57ec
Oct 4, 2018
1.0.0 Release of AUT
automation
moved this from In Progress
to Done
Oct 4, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ianmilligan1 commentedAug 12, 2018
Describe the bug
This grows out of #246, which is about AUT failing on broken ARC files with the error message "invalid distance too far back."
Back when AUT was Warcbase, we ran into the same/similar issue when working with WARC files. This error was fixed for WARC files in this commit.
However, we never updated ArcRecordUtils to introduce similar error handling for ARC files. We should update ArcRecordUtils to have the same error handling as WarcRecordUtils, including the "invalid distance too far back" issue.
Files involved
To Reproduce
See #246. We have the broken files on
tuna
.Expected behavior
We would like to be able to at the very least skip the broken files as per the issue in #246.
Additional context
The proposed solution was found in a comment on that issue, and to make things more straightforward I wanted to open up a new issue here.