Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upImplementing Full-Text Filtering by Domains within AUK #197
Comments
ianmilligan1
added
enhancement
core-feature
feature
labels
Oct 1, 2018
ruebot
added
the
Background jobs
label
Oct 1, 2018
added a commit
that referenced
this issue
Nov 2, 2018
This comment has been minimized.
This comment has been minimized.
@ianmilligan1 7d3f89f#diff-52f1a25071caad35c15ee1e5c34a1ff1R87 Let me know what you want to call that button, and what the tool-tip text should be. |
This comment has been minimized.
This comment has been minimized.
Looking great! How about button: And tool-tip text: |
ruebot
closed this
in
a0be875
Nov 2, 2018
ianmilligan1
referenced this issue
Nov 2, 2018
Closed
Update Docs to Include Full-Text Filtering #202
added a commit
that referenced
this issue
Nov 2, 2018
added a commit
that referenced
this issue
Nov 9, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ianmilligan1 commentedOct 1, 2018
Is your feature request related to a problem? Please describe.
Right now, users can receive the full-text of a web archival collection. Some of these are very big files, and more importantly, they often require a level of filtering to get to the useful content: filtering by date, for example, or filtering by domain.
At our datathons, we've had several teams who have compared the text of domains in a web archive collection to see what they are saying say about a supreme court nominee; labour disruptions; pipelines, etc.
Right now we have documentation on how to filter these files, but this requires that a scholar know how to use the command line and
grep
.Describe the solution you'd like
Ideally, it would be nice to bake some of these
grep
commands into the Archives Unleashed Cloud. See the screenshot below:The steps in this process would be:
grep ',www.khyber.ca,' 9481-fulltext.txt > 9481-www-khyber-ca-text.txt
. In general, it would begrep
,DOMAIN,COLLECTIONNUMBER-fulltext.txt > COLLECTIONNUMBER-domain-text.txt
The actual look and feel of the buttons will probably be different.
Describe alternatives you've considered
There are two main alternatives.
Option A: Using Spark
I'm suggesting a bash command here for two reasons:
Option B: Pointing Users to Documentation
This is adding functionality that we tell people how to do here. I think that makes our service a bit more difficult. This way would let them just click a button, download the text file, and paste it into something like Voyant right away.