New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ArcRecordUtils for better error handling #258

Closed
ianmilligan1 opened this Issue Aug 12, 2018 · 0 comments

Comments

3 participants
@ianmilligan1
Member

ianmilligan1 commented Aug 12, 2018

Describe the bug
This grows out of #246, which is about AUT failing on broken ARC files with the error message "invalid distance too far back."

Back when AUT was Warcbase, we ran into the same/similar issue when working with WARC files. This error was fixed for WARC files in this commit.

However, we never updated ArcRecordUtils to introduce similar error handling for ARC files. We should update ArcRecordUtils to have the same error handling as WarcRecordUtils, including the "invalid distance too far back" issue.

Files involved

To Reproduce
See #246. We have the broken files on tuna.

Expected behavior
We would like to be able to at the very least skip the broken files as per the issue in #246.

Additional context
The proposed solution was found in a comment on that issue, and to make things more straightforward I wanted to open up a new issue here.

@ianmilligan1 ianmilligan1 added the bug label Aug 12, 2018

@ruebot ruebot added this to In review in DataFrames and PySpark Aug 13, 2018

@ruebot ruebot removed this from In review in DataFrames and PySpark Aug 13, 2018

@ruebot ruebot added this to To Do in 1.0.0 Release of AUT Aug 13, 2018

@ruebot ruebot moved this from To Do to In Progress in 1.0.0 Release of AUT Aug 13, 2018

@ruebot ruebot closed this in b8e57ec Oct 4, 2018

1.0.0 Release of AUT automation moved this from In Progress to Done Oct 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment