Hypercane
Hypercane is a framework for building algorithms for sampling mementos from a web archive collection. Hypercane is the entry point of the Dark and Stormy Archives (DSA) toolkit. A user can generate samples with Hypercane and then view those samples via the Web Archive Storytelling tool Raintale, thus allowing the user to automatically summarize a web archive collection as a few small samples visualized as a social media story.
The possibilities with Hypercane do not stop there. Users can employ Hypercane actions to explore a web archive collection through different actions. This README will provide an overview of these actions, but more detailed documentation is forthcoming.
Installing Hypercane
Using PIP
- Install MongoDB
- Clone this repository
- Change into the cloned directory
- Type
pip install .
This grants access to the hc
command which provides the functionality of Hypercane.
Using Docker
The software is still volatile, so you will need to build your own docker image.
- Clone this repository
- Run
docker-compose run hypercane hc --help
This may take a while to download and build necessary docker images. When successful, hc
CLI help will be printed.
Running Hypercane
Hypercane allows you to perform actions on web archive collections, TimeMaps, or lists of Mementos.
For example, the following sample
action executes the random
command to randomly sample mementos from the TimeMaps supplied by timemap-file.txt
and writes the URI-Ms to random-mementos.txt
:
hc sample true-random -i timemaps -a timemap-file.txt -o random-mementos.txt
At the moment, the following actions are supported:
sample
- generate a sample from the collection with various commands, some of the commands may execute variousfilter
,cluster
,score
, andorder
actionsreport
- generate a report on the collection according to various commands, different commands provide information on collection metadata or provide statistics on the collectionsynthesize
- sythesize a web archive collection into the a directory containing files, such as warcs or filesidentify
- produce a list of identifiers (URIs) from the collection based on the input, the different commands indicate the type of web resource desiredfilter
- filter the given collection according to the criteria specified by the given commandcluster
- group the documents identified from the input into clusters, different commands provide different clustering algorithmsscore
- score the mementos from the input based on the command issuedorder
- order the mementos from the input based on the command issued
To discover the list of commands associated with an action, use the --help
command-line option. For example, to discover the commands associated with the filter
action, type hc filter --help
.
Running Hypercane with Docker Compose
- Build the software as specified in the Installing Hypercane - Using Docker subsection above
- Create a working directory for your project
- Copy
docker-compose.yml
into your working directory - Type
docker-compose run hypercane
- Run your desired commands, output will appear within your working directory
- When done, exit from the hypercane container by running
exit
- To stop and remove all the services (such as the cache), run
docker-compose down
The Future of Hypercane
We are working on additional sampling algorithms and options for the advanced actions. Please feel free to submit issues and pull requests at https://github.com/oduwsdl/hypercane