-
Add .getHttpStatus and .getArchiveFile to ArchiveRecordImpl class #198 …
…& #164 (#292) * Resolves #198 * Resolves #164 * Add getHttpStatus to ArchiveRecord class & trait - add .getHttpStatus to potential outputs - add tests for .getHttpStatus calls - improve ArchiveRecord testing overall. * Add .getArchiveFile feature to ArchiveRecordImpl. - add getArchiveFile to trait - add getArchiveFile for ArchiveRecordImpl - add tests for getArchiveFile. * Other code style fixes. * Include updates to tests.
-
-
Change Id generation for graphs from using hashes for urls to using .…
…zipWithUniqueIds() (#289) * Resolves #243 * Create GEXF with proper ids instead of hash to avoid collisions. * Add WriteGEXF files. * Add WriteGraph file and test. * Add test for Graphml output. * Add xml escaping for edges. * Add test case for non-escaped edges. * Add additional tests to cover for more potential cases of graphml and gexf files. * Coverage for null cases in urls.
-
- Follow on to 72cb5e2 - https://nvd.nist.gov/vuln/detail/CVE-2018-7489
-
Update jackson-databind version; resolves #279. (#280)
- CVE-2017-752 - See also: https://nvd.nist.gov/vuln/detail/CVE-2017-7525
-
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits -
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits
-
ExtractBoilerpipeText to remove headers as well. #253 (#256)
* ExtractBoilerpipeText now removes headers.
-
Address main scalastyle errors - #196 (#248)
* Deal with wildcard import lint issues. * Fix some magic numbers & duplicate string runs. * Lint fixes, mostly explicit import warnings. * All other scalastyle issues require refactoring.
-
Add ExtractGraphX including algorithms for PageRank and Components. I…
…ssue 203 (#245) * pom.xml change for GraphX * Changes for GraphXSLS * Changes for SLS graph * Changes for GraphX * Changes for converting WARC RDD to GraphX object * Rename extractor to ExtractGraphX * Various lint fixes (usually Magic Numbers) * Remove illegal imports from scala style (we use wildcard imports a lot) * Add WriteGraphXMLTest.
-
Fix TravisCI build issues (#244)
* Make the TravisCI build less verbose since we're hitting the 4MB log limit. * Pin site.plugin and project-info-reports.plugin so mvn site builds. - See: - https://stackoverflow.com/questions/51091539/maven-site-plugins-3-3-java-lang-classnotfoundexception-org-apache-maven-doxia - https://travis-ci.org/archivesunleashed/aut/jobs/408259462#L3201-L3202
-
Save images from dataframe to disk (#234)
* Save images from dataframe to disk * Fix spacing * Move save images to inline * Refactor to chain and fix concurrency issue * Add save image test * Move saveToDisk to df
-
Add Extract Image Details API (#226); Adresses #220
* Add Extract Image Details API * Change check for jpeg and fix spacing * Add tiff parser * Use AutoDetectParser and read Numeric fields * Use ComputeImageSize * Hex encode hash and base64 encode image bytes * Fix test * Change df column names