archivesunleashed/aut

Commits on Oct 19, 2018

CVE-2018-11771 update (#288 )

ruebot authored and ianmilligan1 committed Oct 19, 2018

Loading status checks…

e6080a7

Commits on Oct 18, 2018

CVE-2017-17485 update; follow-on to #281 . (#287 )

ruebot authored and ianmilligan1 committed Oct 18, 2018

Loading status checks…

3776489

Commits on Oct 17, 2018

Update Apache Tika - security vulnerabilities; resolves #131 . (#285 )

ruebot authored and ianmilligan1 committed Oct 17, 2018

Loading status checks…
```
- CVE-2018-1338
- CVE-2018-11762
- CVE-2018-11761
- CVE-2016-6809
- CVE-2018-1339
- CVE-2018-11796
- CVE-2016-4434
- CVE-2018-1335
```
b9260be
Only trigger TravisCI on master. (#283 )

ruebot authored and ianmilligan1 committed Oct 17, 2018

Loading status checks…

0d4f9bb
[skip travis] Update README (#284 )

ruebot authored and ianmilligan1 committed Oct 17, 2018

f8ee21e
Fix bug and unit test for ExtractDomain; resolves #277 (#278 )

borislin authored and ruebot committed Oct 17, 2018

Loading status checks…

2c66b09
Replace backslash with forward slash in URL; resolves #269 (#276 )

borislin authored and ruebot committed Oct 17, 2018

Loading status checks…
```
* Fix backslash in URL
* Add backslash test in ExtractDomainTest
```
7c3a80d
Missed something for #208 . (#282 )

ruebot authored and ianmilligan1 committed Oct 17, 2018

6b4ef94

Commits on Oct 16, 2018

CVE-2018-7489 fix. (#281 )

ruebot authored and ianmilligan1 committed Oct 16, 2018

Loading status checks…
```
- Follow on to
72cb5e2
- https://nvd.nist.gov/vuln/detail/CVE-2018-7489
```
4fe05a5
Update jackson-databind version; resolves #279 . (#280 )

ruebot authored and ianmilligan1 committed Oct 16, 2018

Loading status checks…
```
- CVE-2017-752
- See also: https://nvd.nist.gov/vuln/detail/CVE-2017-7525
```
72cb5e2

Commits on Oct 9, 2018

Clean-up pom.xml to remove plugin warnings; resolves #273 . (#274 )

ruebot authored and ianmilligan1 committed Oct 9, 2018

Loading status checks…

f19dc9a

Commits on Oct 4, 2018

[maven-release-plugin] prepare for next development iteration

ruebot committed Oct 4, 2018

Loading status checks…

Verified

This commit was signed with a verified signature.

ruebot Nick Ruest

GPG key ID: 417FAF1A0E1080CD Learn about signing commits

9796f50
[maven-release-plugin] prepare release aut-0.17.0

ruebot committed Oct 4, 2018

Loading status checks…

Verified

This commit was signed with a verified signature.

ruebot Nick Ruest

GPG key ID: 417FAF1A0E1080CD Learn about signing commits

694382c

Fix exception error when processing corrupted ARC files, and empty fi…

borislin authored and ruebot committed Oct 4, 2018

…les. (#272)

* Fix exception when processing corrupted ARC files
* Filter out non-empty archive files in loadArchives()
* Fix archive files path pattern
* Resolves #246
* Resolves #271 
* Resolves #258

b8e57ec

Commits on Sep 10, 2018

Update Bug report template. (#268 )

ruebot authored and ianmilligan1 committed Sep 10, 2018

Loading status checks…

c95a51d

Commits on Aug 11, 2018

ExtractBoilerpipeText to remove headers as well. #253 (#256 )

greebie authored and ruebot committed Aug 11, 2018

Loading status checks…
```
* ExtractBoilerpipeText now removes headers.
```
84a4c09

Add additional tweet fields to TweetUtils; partially address #194 . (#254

ruebot authored and ianmilligan1 committed Aug 11, 2018

)

- Adds:
  - retweet_count
  - favorite_count
  - in_reply_to_status_id_str
  - in_reply_to_user_id_str
  - in_reply_to_screen_name
  - source
  - user.protected
  - user.profile_image_url
  - user.description
  - user.location
  - user.name
  - user.url
  - user.time_zone
- Updates some doc comments
- Updates tests

e4cf9a7

Commits on Aug 10, 2018

Add support for full_text in tweets; resolve #192 . (#252 )

ruebot authored and ianmilligan1 committed Aug 10, 2018

Loading status checks…

62628b4
Get rid of 'filesystem-root relative reference' warning. (#251 )

ruebot authored and ianmilligan1 committed Aug 10, 2018

Loading status checks…

a5fa151

Commits on Aug 9, 2018

Remove stray characters from example commands. (#250 )

ruebot authored and ianmilligan1 committed Aug 9, 2018

Loading status checks…

9efde66
Deal with final scalastyle assessments, and Convert nulls to Option(T…

greebie authored and ruebot committed Aug 9, 2018

Loading status checks…
```
…). (#249)

* Fully resolves #196 
* Resolves #212
```
004ce1f

Commits on Aug 1, 2018

Address main scalastyle errors - #196 (#248 )

greebie authored and ruebot committed Aug 1, 2018

* Deal with wildcard import lint issues.
* Fix some magic numbers & duplicate string runs.
* Lint fixes, mostly explicit import warnings.
* All other scalastyle issues require refactoring.

77dbd51

Commits on Jul 29, 2018

Add ExtractGraphX including algorithms for PageRank and Components. I…

greebie authored and ianmilligan1 committed Jul 29, 2018

…ssue 203 (#245)

* pom.xml change for GraphX
* Changes for GraphXSLS
* Changes for SLS graph
* Changes for GraphX
* Changes for converting WARC RDD to GraphX object
* Rename extractor to ExtractGraphX
* Various lint fixes (usually Magic Numbers)
* Remove illegal imports from scala style (we use wildcard imports a lot)
* Add WriteGraphXMLTest.

afe9254

Commits on Jul 27, 2018

Fix TravisCI build issues (#244 )

ruebot authored and ianmilligan1 committed Jul 27, 2018

* Make the TravisCI build less verbose since we're hitting the 4MB log limit.
* Pin site.plugin and project-info-reports.plugin so mvn site builds.
  - See:
    - https://stackoverflow.com/questions/51091539/maven-site-plugins-3-3-java-lang-classnotfoundexception-org-apache-maven-doxia
    - https://travis-ci.org/archivesunleashed/aut/jobs/408259462#L3201-L3202

290b6aa

Commits on May 28, 2018

Data frame implementation of extractors. Also added cmd arguments to r…

TitusAn authored and ruebot committed May 28, 2018

Loading status checks…
```
…esolve #235 (#236)

* initial implementation
* Data frame implementation of extractors.
* fix documentation.
```
c73a92b

Commits on May 25, 2018

Save images from dataframe to disk (#234 )

JWZ2018 authored and lintool committed May 25, 2018

* Save images from dataframe to disk
* Fix spacing
* Move save images to inline
* Refactor to chain and fix concurrency issue
* Add save image test
* Move saveToDisk to df

c0a8b78

Commits on May 22, 2018

Add missing dependencies in; addresses #227 . (#233 )

ruebot authored and lintool committed May 22, 2018

Loading status checks…

496cd1b

Commits on May 21, 2018

ArchiveRecord + impl moved into same Scala file; code cleanup. (#230 )

lintool authored and ruebot committed May 21, 2018

Loading status checks…

e57a99c

Add Extract Image Details API (#226 ); Adresses #220

JWZ2018 authored and ruebot committed May 21, 2018

* Add Extract Image Details API
* Change check for jpeg and fix spacing
* Add tiff parser
* Use AutoDetectParser and read Numeric fields
* Use ComputeImageSize
* Hex encode hash and base64 encode image bytes
* Fix test
* Change df column names

a9649aa

Commits on May 16, 2018

Implement DomainFrequency, DomainGraph and PlainText extractor that c…

TitusAn authored and lintool committed May 16, 2018

…an be run from command line (#225)

* Resolves issue 195. Implement DomainFrequency, DomainGraph and PlainText extractor that can be run via command line in spark-submit, along with their tests

* Restructure CommandLineAppRunner to make it more robust. Add option to write GEXF output for DomainGraphExtractor (enable via --output-format GEXF). Add support for multiple input files. Other polish and cleanup.

2bdc740

Commits on May 15, 2018

Remove duplicate call of keepValidPages (#224 )

JWZ2018 authored and ruebot committed May 15, 2018

Loading status checks…

6f9f9b4

Extract Image Links DF API + Test (#221 )

JWZ2018 authored and ruebot committed May 15, 2018

* Extract Image Links DF API
* Add extract image links text
* Remove unnecessary comment from test
* Add doc comments
* Addresses #220

3f3c423

Commits on May 14, 2018

Update Apache Spark to 2.3.0; resolves #218 (#219 )

ruebot authored and ianmilligan1 committed May 14, 2018

- Update tests to use workaround for SPARK-2243
- Comment out ExtractGraph test as per https://github.com/archivesunleashed/aut/pull/204/files#diff-4541b9834513985c360b64093fd45073
- Align Hadoop version with Apache Spark pom.xml https://github.com/apache/spark/blob/branch-2.3/pom.xml#L120

fc8f4bf

Resolve archivesunleashed/docker-aut#17 (#217 )

ruebot authored and ianmilligan1 committed May 14, 2018

Loading status checks…

b8a8a97

Commits on May 2, 2018

Create issue templates (#216 )

ruebot authored and ianmilligan1 committed May 2, 2018

Loading status checks…
```
* Create issue templates
```
ef6ea36

NewerOlder