Skip to content
Permalink
Browse files

Adds Jupyter Notebooks to Website, resolves #115 (#116)

* Adding AUK Notebooks to Website, resolves #115

* @greebie review (thx!)
  • Loading branch information...
ianmilligan1 authored and SamFritz committed Apr 18, 2019
1 parent 17c9094 commit 9711e5c0526283fe00051ec137cd6c5dbf65d6bb
@@ -62,6 +62,11 @@ googleAnalytics = "UA-2879197-28"
url = "cloud/"
weight = 16

[[menu.main]]
name = "Archives Unleashed Notebooks"
url = "notebooks/"
weight = 17

[[menu.main]]
name = "Warclight"
url = "warclight/"
@@ -10,7 +10,7 @@ weight: 0

Archives Unleashed aims to make petabytes of historical internet content accessible to scholars and others interested in researching the recent past. Supported by a grant from the [Andrew W. Mellon Foundation](https://mellon.org), we are developing web archive search and data analysis tools to enable scholars, librarians and archivists to access, share, and investigate recent history since the early days of the World Wide Web.

Interested in the project? Subscribe to our [newsletter](/get-involved/#subscribe)! Or you can follow the links at left for information [about the project](/about-project), the [Archives Unleashed Cloud](/cloud), [Archives Unleashed Toolkit](/aut), or our [events](/events).
Interested in the project? Subscribe to our [newsletter](/get-involved/#subscribe)! Or you can follow the links at left for information [about the project](/about-project), the [Archives Unleashed Cloud](/cloud), [Archives Unleashed Toolkit](/aut), [Archives Unleashed Jupyter Notebooks](/notebooks), or our [events](/events).

We're always looking for [ways to engage](/get-involved) archivists, librarians, researchers, developers, or any others interested in born-digital heritage!

@@ -0,0 +1,45 @@
---
title: Archives Unleashed Jupyter Notebooks
date: 2017-11-14T09:21:11-05:00
weight: 24
---

## Introduction

![AUK Notebook screenshot](/images/AUK_Notebook.png)

We are excited to introduce our Archives Unleashed Jupyter Notebooks, a prototype method for working with the derivatives generated by the Archives Unleashed Cloud. They allow you to interactively explore and filter the domain count information, extracted full text, and network visualization data generated by the Cloud.

We are currently exploring greater integration between the notebooks and the Archives Unleashed Cloud.

**To use them now, please visit the GitHub repository [here](https://github.com/archivesunleashed/auk-notebooks) and follow the instructions.** You can play with a live demo via Binder [here](https://mybinder.org/v2/gh/archivesunleashed/auk-notebooks/master?filepath=auk-notebook.ipynb).

{{< note title="This is still in the prototype stage!" >}}
Any and all feedback and suggestions are greatly appreciated and can be sent to [Samantha Fritz](sam.fritz@archivesunleashed.org), our project manager.
{{< /note >}}

You can read more about the thinking behind these Notebooks in our Medium post, "[Exploring Web Archival Data through Archives Unleashed Cloud Jupyter Notebooks](https://news.archivesunleashed.org/exploring-web-archival-data-through-archives-unleashed-cloud-jupyter-notebooks-7605c6ca2b33)."

There are three notebooks: domain analysis, text analysis, and network analysis. Each are discussed below.

## Domain Analysis

![AUK Notebook screenshot](/images/AUK_Notebook_Domains.png)

Domains are a fairly basic analysis of the web archive that highlight what domains are included and how often they appear. You can, for example, see how many .com addresses are in the collection or which domains are over or underrepresented.

## Text Analysis

![AUK Notebook screenshot](/images/AUK_Notebook_Text.png)

Text analysis is a popular way to do exploratory analysis of web archive data. The [Natural Languages Toolkit](https://www.nltk.org) (nltk) library offers an array of options here. For example, we can see which domains use which words; how words are dispersed around a collection; or the average sentiment of a collection.

## Network Analysis

![AUK Notebook screenshot](/images/AUK_Notebook_Network.png)

Archives Unleashed Cloud network derivatives already offer some solid visualisation information out of the box using our GraphPass tool. This includes node sizing (based on Degree), positioning (based on the Fruchterman Reingold algorithm) and coloring (based on walktrap modularity). But that does not mean one cannot use Python libraries like networkx to produce interesting analyses. For instance, creating an ego network of a particular node in the graph is pretty straight-forward.

## Try It Yourself!

As noted, you can currently try the notebooks via our [GitHub repository](https://github.com/archivesunleashed/auk-notebooks). You can also get a sneak peak by running it via [Binder](https://mybinder.org/v2/gh/archivesunleashed/auk-notebooks/master?filepath=auk-notebook.ipynb).
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

0 comments on commit 9711e5c

Please sign in to comment.
You can’t perform that action at this time.