Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add webgraph, imagegraph, webpages, etc. to command line app #431

Closed
ruebot opened this issue Apr 6, 2020 · 2 comments
Closed

Add webgraph, imagegraph, webpages, etc. to command line app #431

ruebot opened this issue Apr 6, 2020 · 2 comments

Comments

@ruebot
Copy link
Member

@ruebot ruebot commented Apr 6, 2020

Currently, we have one of the standard auk derivatives as an "app", DomainFrequencyExtractor.

We should also add:

  • full-text
  • network

It might also be worth adding, and tweaking where need be if they already exist in the app, the DataFrame derivatives we produced for the NYC and IIPC datathons.

  • webpages
  • webgraph
  • imagegraph
  • domains
@ruebot ruebot self-assigned this Apr 6, 2020
@ruebot

This comment has been minimized.

Copy link
Member Author

@ruebot ruebot commented Apr 6, 2020

DomainGraphExtractor is the network graph job without WriteGraph.asGraphml.

@ruebot

This comment has been minimized.

Copy link
Member Author

@ruebot ruebot commented Apr 6, 2020

🤦‍♂ PlainTextExtractor is there. Just needs some tweaks.

@ruebot ruebot changed the title Add auk derivatives to command line app Add webgraph, imagegraph, webpages, etc. to command line app Apr 6, 2020
ruebot added a commit that referenced this issue Apr 6, 2020
- Resolves #431
- Adds webpages, and imagegraph to command line app
- Adds tests for new functionality
- Clean-up doc comments
- Convert files with dos line endings to unix line endings
@ruebot ruebot added this to In Progress in 1.0.0 Release of AUT Apr 7, 2020
@ruebot ruebot added this to In Progress in DataFrames and PySpark Apr 7, 2020
ruebot added a commit to archivesunleashed/aut-docs that referenced this issue Apr 7, 2020
- Resolves #14
- Documents archivesunleashed/aut#431
1.0.0 Release of AUT automation moved this from In Progress to Done Apr 7, 2020
DataFrames and PySpark automation moved this from In Progress to In review Apr 7, 2020
ianmilligan1 pushed a commit that referenced this issue Apr 7, 2020
- Resolves #431
- Adds webpages, and imagegraph to command line app
- Adds tests for new functionality
- Clean-up doc comments
- Convert files with dos line endings to unix line endings
- Update CommandLineApp tests
ianmilligan1 pushed a commit to archivesunleashed/aut-docs that referenced this issue Apr 7, 2020
- Resolves #14
- Documents archivesunleashed/aut#431
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

1 participant
You can’t perform that action at this time.