Tree: 3ab17bf017
-
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits -
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits
-
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Add "Extract popular images" DataFrame implementation (#382).
- Add tests for ExtractPopularImagesDF - Rename ExtractPopularImages to ExtractPopularImagesRDD - Addresses #223
-
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Add all() method and refactor DF UDFs (#383).
- Add `all()` DataFrame method - Refactor fixity DataFrame UDFs - Add ComputeImageSize UDF - Add Python implementation of `all()` - Addresses #223
-
Rename pages() to webpages(). (#384)
- Part of work on #233
-
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Append UDF with RDD or RF. (#381)
- Addresses #223
-
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Extend more Matchbook utilities to DataFrames (#380).
- Extend GetExtensionMime, ExtractImageLinks, ComputeMD5, and ComputeSHA1 to DataFrames - Addresses #223
-
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Finalize converting NER Classifier to WANE Format (#378).
- Fully resolves #297 - Overrides NER Classifier output to PERSON -> persons, LOCATION -> locations, ORGANIZATION -> organizations
-
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits -
Tweaks the style of the license badge to look consistent with the other badges.
-
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits -
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits -
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits -
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits -
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits -
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits -
#356 verify-javadocs fixed -- make sure you have OpenJDK-11 fully ins…
…talled :facepalm:, and a bunch more pom cleanup.
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits -
ruebot committed
Nov 10, 2019 Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits
-
ruebot committed
Nov 9, 2019 - Some hacks to get a sucessful build - Definitely need to loop back and clean-up a whole lot! - Addresses #356
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits
-
Merge branch 'master' of github.com:archivesunleashed/aut into issue-356
ruebot committedNov 8, 2019 Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits
-
Align NER output to WANE format; addresses #297 (#361)
- Update Stanford core NLP - Format NER output in json - Add getPayloadDigest to ArchiveRecord - Add test for getPayloadDigest - Add payload digest to NER output - Remove extractFromScrapeText - Remove extractFromScrapeText test - TODO: PERSON -> persons, LOCATION -> locations, ORGANIZATION -> organizations (involves writing a new class or overriding NER output
🤢 -
Various UDF implementation and cleanup for DF. (#370)
- Replace ExtractBaseDomain with ExtractDomain - Closes #367 - Address bug in ArcTest; RemoveHTML -> RemoveHttpHeader - Closes #369 - Wraps RemoveHttpHeader and RemoveHTML for use in data frames. - Partially addresses #238 - Updates tests where necessary - Punts on #368 UDF CaMeL cASe consistency issues
-
Update keepValidPages to include a filter on 200 OK. (#360)
- Add status code filter to keepValidPages - Add MimeTypeTika to valid pages DF - Update tests since we filter more and better now
😄 - Resolves #359