The Archives Unleashed Toolkit: Latest Documentation

The Archives Unleashed Toolkit is an open-source platform for analyzing web archives built on Apache Spark, which provides powerful tools for analytics and data processing.

Most of this documentation is built on resilient distributed datasets (RDD). We are working on adding support for DataFrames. You can read more about this in our experimental DataFrames section, and at our [[Using the Archives Unleashed Toolkit with PySpark]] tutorial.

If you want to learn more about Apache Spark, we highly recommend Spark: The Definitive Guide

Our documentation is divided into several main sections, which cover the Archives Unleashed Toolkit workflow from analyzing collections to understanding and working with the results.

Getting Started

Generating Results

Filtering Results

Filters: A variety of ways to filter results.

What to do with Results

Acknowledgments

This work is primarily supported by the Andrew W. Mellon Foundation. Other financial and in-kind support comes from the Social Sciences and Humanities Research Council, Compute Canada, the Ontario Ministry of Research, Innovation, and Science, York University Libraries, Start Smart Labs, and the Faculty of Arts and David R. Cheriton School of Computer Science at the University of Waterloo.

Any opinions, findings, and conclusions or recommendations expressed are those of the researchers and do not necessarily reflect the views of the sponsors.

Name	Latest commit message	Commit time
..
Failed to load latest commit information.
README.md
aut-at-scale.md	Incorporate PySpark setup into overall documentation. (#16 )	Oct 28, 2019
binary-analysis.md
collection-analysis.md
cookbook.md
df-results.md	Wrote up beginnings of 'what to do with results' (#3 )	Oct 21, 2019
filters.md
image-analysis.md	Add binary analysis (#11 )	Oct 26, 2019
index.md	Refactoring Documentation for Explanations and Consistent Structure (#5 )	Oct 21, 2019
link-analysis.md
rdd-results.md
release-process.md	Incorporate PySpark setup into overall documentation. (#16 )	Oct 28, 2019
setting-up-aut.md
text-analysis.md
toolkit-walkthrough.md

Please note that GitHub no longer supports your web browser.

archivesunleashed/aut-docs-new

README.md