Tree: 4322eb920f
-
Add ExtractPopularImages, WriteGEXF, and WriteGraphML to Python. (#466)
- Resolves #409 - Add Python implementations of - ExtractPopularImages - WriteGraphML - WriteGEXF - Clean up formatting in app.py, and udfs - Cleanup doc comments on the Scala side
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits -
[maven-release-plugin] prepare release aut-0.70.0
ruebot committedMay 4, 2020 Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits -
[skip travis] README updates (#460)
ruebot committedMay 4, 2020 - `$` should only be used if output is also shown (mdl) - Add UserDoc badge, and yank buried documentation section - Additional formatting and typo fixes
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Add RemovePrefixWWWDF to DomainFrequencyExtractor. (#457)
- Resolves #456 - Update test
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
[skip travis] Updating Java install instructions, resolves #445 (#455)
ianmilligan1 committedApr 23, 2020 Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Add option to save to Parquet for app. (#454)
- Resolves #448 - Update test - Add CSV headers to coalesce CSV output - Update README
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Update PlainTextExtractor to output a single column; text. (#453)
- Resolves #452 - PlainTextExtractor runs ExtractBoilerplate on `content` - Update test
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Add a number of additional app extractors. (#451)
- Resolves #447 - Add AudioInformationExtractor, ImageInformationExtractor, PDFInformationExtractor, PresentationProgramInformationExtractor, SpreadsheetInformationExtractor, TextFilesInformationExtractor, VideoInformationExtractor, WebGraphExtractor, WordProcessorInformationExtractor - Add tests for the new extractors - Update CommandLineApp to use new extractors - Add domain, and language column WebPagesExtractor - Change "TEXT" to "csv" - Lower case "GEXF" and "GRAPHML"
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Remove RDD option in app; DataFrame only now. (#450)
- Resolves #449 - Updates and renames tests were applicable - Update README to reflect updates
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits -
[maven-release-plugin] prepare release aut-0.60.0
ruebot committedApr 15, 2020 Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits
-
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Remove GraphX support; resolves #442. (#443)
- Remove graphx dependencies from pom - Remove ExtractGraphX and related tests - Remove WriteGraphXML and related tests
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Add graphml output to CommandLineApp and DomainGraphExtractor. (#438)
* Resolves #435 * Adds GRAPHML option to CommandLineApp * Adds DataFrame method to DomainGraphExtractor * Updates CommandLineApp, and WriteGraphML tests
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Align RDD and DF output for DomainGraphExtractor. (#437)
- Resolves #436 - Remove WWW prefix for RDD was double escaping - Update DF so it matches RDD output (it wasn't even close before
🤦 ) - Update tests so they're basically testing the same thingVerified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Add imagegraph, and webgraph to command line app. (#432)
- Resolves #431 - Adds webpages, and imagegraph to command line app - Adds tests for new functionality - Clean-up doc comments - Convert files with dos line endings to unix line endings - Update CommandLineApp tests
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Tweak hasDate to handle Seq. (#430)
Tweak hasDate to handle Seq. - Addresses #425 - Add test for hasDate
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Restyle keep/discard filter UDFs in the context of DataFrames (#429)
Co-authored-by: g285sing <g285sing@student.cs.uwaterloo.ca> (@SinghGursimran) - Resolves #425 - Replace all keep/discard DF udfs with `hasXYZ()` - Update tests
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Update Spark and Hadoop versions. (#426)
- Update Spark to 2.4.5 - Update Hadoop to 2.7.4 (for RADOS/S3 support) - Tweak README
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Add logic so UDFs that filter on url should also filter on src (#424).
- Resolves #418 - Update tests Co-authored-by: Nick Ruest <ruestn@gmail.com>
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
[skip travis] Add pre-print link to README. (#423)
ruebot committedFeb 11, 2020 * [skip travis] Add pre-print link to README.
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Add img alt text to imagegraph(); resolves #420. (#422)
- Update ExtractImageLinksRDD to grab alt text - Add alt_text column to imagegraph - Update tests
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Rename imageLinks to imagegraph; resolves #419 (#421)
* Rename imageLinks to imagegraph; resolves #419
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits
-
Need --repositories flag with --packages. (#417)
- Fully resolves this issue archivesunleashed/docker-aut#19 - archivesunleashed/docker-aut@37ce4e2 - archivesunleashed/docker-aut@082907a - archivesunleashed/docker-aut@baee431
Verified
This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits -
Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits
-
[maven-release-plugin] prepare release aut-0.50.0
ruebot committedFeb 5, 2020 Verified
This commit was signed with a verified signature.ruebot Nick RuestGPG key ID: 417FAF1A0E1080CD Learn about signing commits
-
Clean up test descriptions, addresses #372. (#416)
- Clean up test descriptions - Rename typo filename
-
Add ExtractImageDetailsDF. (#415)
- Add test - Addresses #223