Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
Permalink
Browse files

Fixed Table of Content on Current Doc README (#10)

* Fixing table of content links

* Adding more relative links

* Adding image (fixing existing markdown)

* Removing image on seeing it rendered
  • Loading branch information...
ianmilligan1 committed Oct 23, 2019
1 parent 5173748 commit 4a8955ee5a7f94c04324aea13d96b5bbff4694b8
Showing with 18 additions and 20 deletions.
  1. +18 −20 current/README.md
@@ -1,7 +1,5 @@
# The Archives Unleashed Toolkit: Latest Documentation

![https://archivesunleashed.org/images/prompt.png](Spark Shell in Action)

The Archives Unleashed Toolkit is an open-source platform for analyzing web archives built on [Hadoop](https://hadoop.apache.org/). Tight integration with Hadoop provides powerful tools for analytics and data processing via [Spark](http://spark.apache.org/).

Most of this documentation is built on [resilient distributed datasets (RDD)](https://spark.apache.org/docs/latest/rdd-programming-guide.html). We are working on adding support for [DataFrames](https://spark.apache.org/docs/latest/sql-programming-guide.html#datasets-and-dataframes). You can read more about this in our experimental [DataFrames section](#dataframes), and at our [[Using the Archives Unleashed Toolkit with PySpark]] tutorial.
@@ -18,11 +16,11 @@ Our documentation is divided into several main sections, which cover the Archive

### Generating Results
- [**Collection Analysis**](collection-analysis.md): How do I...
- [Extract All URLs](#Extract-All-URLs)
- [Extract Top-Level Domains](#Extract-Top-Level-Domains)
- [Extract Different Subdomains](#Extract-Different-Subdomains)
- [Extract HTTP Status Codes](#Extract-HTTP-Status-Codes)
- [Extract the Location of the Resource in ARCs and WARCs](#Extract-the-Location-of-the-Resource-in-ARCs-and-WARCs)
- [Extract All URLs](collection-analysis.md#Extract-All-URLs)
- [Extract Top-Level Domains](collection-analysis.md#Extract-Top-Level-Domains)
- [Extract Different Subdomains](collection-analysis.md#Extract-Different-Subdomains)
- [Extract HTTP Status Codes](collection-analysis.md#Extract-HTTP-Status-Codes)
- [Extract the Location of the Resource in ARCs and WARCs](collection-analysis.md#Extract-the-Location-of-the-Resource-in-ARCs-and-WARCs)
- [**Text Analysis**](https://github.com/archivesunleashed/aut-docs-new/blob/master/current/text-analysis.md): How do I...
- [Extract All Plain Text](text-analysis.md#Extract-All-Plain-Text)
- [Extract Plain Text Without HTTP Headers](text-analysis.md#Extract-Plain-Text-Without-HTTP-Headers)
@@ -35,23 +33,23 @@ Our documentation is divided into several main sections, which cover the Archive
- [Extract Raw HTML](text-analysis.md#Extract-Raw-HTML)
- [Extract Named Entities](text-analysis.md#Extract-Named-Entities)
- **[Link Analysis](https://github.com/archivesunleashed/aut-docs-new/blob/master/current/link-analysis.md)**: How do I...
- [Extract Simple Site Link Structure](#Extract-Simple-Site-Link-Structure)
- [Extract Raw URL Link Structure][#Extract-Raw-URL-Link-Structure]
- [Organize Links by URL Pattern][#Organize-Links-by-URL-Pattern]
- [Organize Links by Crawl Date][#Organize-Links-by-Crawl-Date]
- [Export as TSV][#Export-as-TSV]
- [Filter by URL][#Filter-by-URL]
- [Export to Gephi][#Export-to-Gephi]
- **[Image Analysis](https://github.com/archivesunleashed/aut-docs-new/blob/master/current/image-analysis.md)**: How do I...
- [Most Frequent Image URLs](#Most-Frequent-Image-URLs)
- [Most Frequent Images MD5 Hash](#Most-Frequent-Images-MD5-Hash)
- [Extract Simple Site Link Structure](link-analysis.md#Extract-Simple-Site-Link-Structure)
- [Extract Raw URL Link Structure](link-analysis.md#Extract-Raw-URL-Link-Structure)
- [Organize Links by URL Pattern](link-analysis.md#Organize-Links-by-URL-Pattern)
- [Organize Links by Crawl Date](link-analysis.md#Organize-Links-by-Crawl-Date)
- [Export as TSV](link-analysis.md#Export-as-TSV)
- [Filter by URL](link-analysis.md#Filter-by-URL)
- [Export to Gephi](link-analysis.md#Export-to-Gephi)
- **[Image Analysis](image-analysis.md)**: How do I...
- [Most Frequent Image URLs](image-analysis.md#Most-Frequent-Image-URLs)
- [Most Frequent Images MD5 Hash](image-analysis.md#Most-Frequent-Images-MD5-Hash)

### Filtering Results
- **[Filters](https://github.com/archivesunleashed/aut-docs-new/blob/master/current/filters.md NOW)**: A variety of ways to filter results.
- **[Filters](filters.md)**: A variety of ways to filter results.

### What to do with Results
- **[What to do with DataFrame Results](https://github.com/archivesunleashed/aut-docs-new/blob/master/current/df-results.md)**
- **[What to do with RDD Results](https://github.com/archivesunleashed/aut-docs-new/blob/master/current/rdd-results.md)**
- **[What to do with DataFrame Results](df-results.md)**
- **[What to do with RDD Results](rdd-results.md)**

## Further Reading

0 comments on commit 4a8955e

Please sign in to comment.
You can’t perform that action at this time.