Tree: b9f7a76750
-
Update README and add LICENSE.txt
ruebot committedApr 22, 2019 - Add badges to README - Update Markdown - Clean-up formatting - Add LICENSE.txt in root - Partially addresses internetarchive#233
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits
-
Update changelog for 3.4.0-20190418
anjackson committedApr 18, 2019 Verified
This commit was signed with a verified signature.anjackson Andy JacksonGPG key ID: 91766E541C8B48B3 Learn about signing commits -
[maven-release-plugin] prepare for next development iteration
anjackson committedApr 18, 2019 Verified
This commit was signed with a verified signature.anjackson Andy JacksonGPG key ID: 91766E541C8B48B3 Learn about signing commits -
[maven-release-plugin] prepare release 3.4.0-20190418
anjackson committedApr 18, 2019 Verified
This commit was signed with a verified signature.anjackson Andy JacksonGPG key ID: 91766E541C8B48B3 Learn about signing commits
-
Merge pull request internetarchive#238 from ruebot/issue-233
anjackson committedApr 17, 2019 Add CHANGELOG; address internetarchive#233.
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Merge pull request internetarchive#251 from nlevitt/trough-dedup
adam-miller committedApr 15, 2019 fix some trough dedup bugs
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Merge pull request internetarchive#249 from ruebot/remove-suffix-craw…
nlevitt committedApr 10, 2019 …ler-bean Remove suffix from warcWriter since it is no longer used.
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Merge pull request internetarchive#253 from dvanduzer/master
nlevitt committedApr 10, 2019 set of frontier management changes to support CrawlHQ module
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
nlevitt committed
Apr 10, 2019 -
close the rethinkdb connection!
nlevitt committedApr 10, 2019 -
Merge branch 'master' into trough-dedup
nlevitt committedApr 10, 2019 * master: replace System.err.println with logger.info Revert "Upgrade httpclient to 4.5.7 and handle cookies more compliantly" Removing outdated test. Disable questionalbe test. Avoid deprecated flag. Supply an iterator, for internetarchive#245 Updated POM to use latest version. Update README.md Handle missing closing paren in srcset descriptor Teach jericho extractor srcset Don't run srcset test against jericho, it doesn't handle it Handle commas more compliantly when parsing srcset Ensure we start parsing full lines, for internetarchive#239.
-
set of frontier management changes to support CrawlHQ module
David Van Duzer committedApr 8, 2019 These changes come from a private fork of H3, originally made by Kenji Nagahashi, to create org.archive.crawler.frontier.PullingBdbFrontier, which we intend to merge into 'contrib' of the official version in the near future.
-
replace System.err.println with logger.info
nlevitt committedApr 5, 2019
-
nlevitt committed
Apr 1, 2019 especially this: - writeUrlCache.remove("segmentId"); + writeUrlCache.remove(segmentId); and some improvements and tweaks
-
Remove suffix from warcWriter since it is no longer used.
ruebot committedMar 29, 2019 Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits
-
Merge pull request internetarchive#248 from internetarchive/revert-24…
ato committedMar 28, 2019 …6-upgrade-httpclient Revert "Upgrade httpclient to 4.5.7 and handle cookies more compliantly"
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Revert "Upgrade httpclient to 4.5.7 and handle cookies more compliantly"
ato committedMar 28, 2019 Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Merge pull request internetarchive#246 from ukwa/upgrade-httpclient
nlevitt committedMar 22, 2019 Upgrade httpclient to 4.5.7 and handle cookies more compliantly
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Merge pull request internetarchive#242 from nlevitt/trough-dedup
jkafader committedMar 21, 2019 Trough dedup
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
use constant from rethinkdb lib for default port
nlevitt committedMar 21, 2019 -
Merge branch 'master' into upgrade-httpclient
anjackson committedMar 21, 2019 Verified
This commit was signed with a verified signature.anjackson Andy JacksonGPG key ID: 91766E541C8B48B3 Learn about signing commits -
anjackson committed
Mar 21, 2019 Verified
This commit was signed with a verified signature.anjackson Andy JacksonGPG key ID: 91766E541C8B48B3 Learn about signing commits
-
anjackson committed
Mar 20, 2019 Verified
This commit was signed with a verified signature.anjackson Andy JacksonGPG key ID: 91766E541C8B48B3 Learn about signing commits -
anjackson committed
Mar 20, 2019 Verified
This commit was signed with a verified signature.anjackson Andy JacksonGPG key ID: 91766E541C8B48B3 Learn about signing commits -
Supply an iterator, for internetarchive#245
anjackson committedMar 20, 2019 Verified
This commit was signed with a verified signature.anjackson Andy JacksonGPG key ID: 91766E541C8B48B3 Learn about signing commits -
Updated POM to use latest version.
anjackson committedMar 20, 2019 Verified
This commit was signed with a verified signature.anjackson Andy JacksonGPG key ID: 91766E541C8B48B3 Learn about signing commits -
less alarming logging for normal situation
nlevitt committedMar 20, 2019
-
Merge pull request internetarchive#243 from internetarchive/srcset
adam-miller committedMar 19, 2019 Handle commas more compliantly when parsing srcset
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
make TroughCrawlLogFeed use TroughClient, and...
nlevitt committedMar 19, 2019 ... configure using rethinkdb url and segment id, instead of write url, which means it can work if the segment gets reassigned and so forth ***backward incompatible change***
-
Merge pull request internetarchive#244 from mikeizbicki/patch-1
ato committedMar 19, 2019 Update README.md
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Handle missing closing paren in srcset descriptor
ato committedMar 16, 2019 -
Teach jericho extractor srcset
ato committedMar 16, 2019 -
Don't run srcset test against jericho, it doesn't handle it
ato committedMar 16, 2019 -
Handle commas more compliantly when parsing srcset
ato committedMar 16, 2019 Commas are allowed if they're in the middle of the URL. Consequently: srcset="a,b,,c," => ["a,b,,c"] srcset="a, b,, c," => ["a", "b", "c"] They occur particularly commonly in data: URLs before the base64 value. Commas are also allowed in descriptors if they are enclosed by parens: srcset="a (b,c),d" => ["a", "d"] Spec: https://html.spec.whatwg.org/multipage/images.html#parsing-a-srcset-attribute