Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign uptextFiles does not filter properly #390
Closed
Labels
Comments
ruebot
added a commit
that referenced
this issue
Dec 17, 2019
- Add ExtractDateDF test - Fix conditional logic of textFiles filter to resolve #390 - Add test for conditional logic fix for #390 - Remove cruft ExtractUrls, left over from Twitter analysis removal (see: https://github.com/lintool/warcbase/blob/cab311ed8b0bceb666865fa76dd3bc5a86402e13/warcbase-core/src/test/scala/org/warcbase/spark/matchbox/ExtractUrlsTest.scala) - Tweak null/nothing on a few tests
ianmilligan1
added a commit
that referenced
this issue
Dec 17, 2019
- Add ExtractDateDF test - Fix conditional logic of textFiles filter to resolve #390 - Add test for conditional logic fix for #390 - Remove cruft ExtractUrls, left over from Twitter analysis removal (see: https://github.com/lintool/warcbase/blob/cab311ed8b0bceb666865fa76dd3bc5a86402e13/warcbase-core/src/test/scala/org/warcbase/spark/matchbox/ExtractUrlsTest.scala) - Tweak null/nothing on a few tests
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ruebot commentedDec 17, 2019
Describe the bug
The conditional logic in
textFiles
does not filter properly. You can test with an example WARC insrc/test/resources
To Reproduce
Expected behavior
We should be filtering out
robots.txt
files, along with alljs
,css
,html
, andhtm
files.Environment information