Permalink
Please
sign in to comment.
Showing
with
51 additions
and 1 deletion.
- +5 −0 config.toml
- +1 −1 content/_index.md
- +45 −0 content/notebooks/index.md
- BIN themes/hugo-material-docs/static/images/AUK_Notebook.png
- BIN themes/hugo-material-docs/static/images/AUK_Notebook_Domains.png
- BIN themes/hugo-material-docs/static/images/AUK_Notebook_Network.png
- BIN themes/hugo-material-docs/static/images/AUK_Notebook_Text.png
@@ -0,0 +1,45 @@ | |||
--- | |||
title: Archives Unleashed Jupyter Notebooks | |||
date: 2017-11-14T09:21:11-05:00 | |||
weight: 24 | |||
--- | |||
|
|||
## Introduction | |||
|
|||
![AUK Notebook screenshot](/images/AUK_Notebook.png) | |||
|
|||
We are excited to introduce our Archives Unleashed Jupyter Notebooks, a prototype method for working with the derivatives generated by the Archives Unleashed Cloud. They allow you to interactively explore and filter the domain count information, extracted full text, and network visualization data generated by the Cloud. | |||
|
|||
We are currently exploring greater integration between the notebooks and the Archives Unleashed Cloud. | |||
|
|||
**To use them now, please visit the GitHub repository [here](https://github.com/archivesunleashed/auk-notebooks) and follow the instructions.** You can play with a live demo via Binder [here](https://mybinder.org/v2/gh/archivesunleashed/auk-notebooks/master?filepath=auk-notebook.ipynb). | |||
|
|||
{{< note title="This is still in the prototype stage!" >}} | |||
Any and all feedback and suggestions are greatly appreciated and can be sent to [Samantha Fritz](sam.fritz@archivesunleashed.org), our project manager. | |||
{{< /note >}} | |||
|
|||
You can read more about the thinking behind these Notebooks in our Medium post, "[Exploring Web Archival Data through Archives Unleashed Cloud Jupyter Notebooks](https://news.archivesunleashed.org/exploring-web-archival-data-through-archives-unleashed-cloud-jupyter-notebooks-7605c6ca2b33)." | |||
|
|||
There are three notebooks: domain analysis, text analysis, and network analysis. Each are discussed below. | |||
|
|||
## Domain Analysis | |||
|
|||
![AUK Notebook screenshot](/images/AUK_Notebook_Domains.png) | |||
|
|||
Domains are a fairly basic analysis of the web archive that highlight what domains are included and how often they appear. You can, for example, see how many .com addresses are in the collection or which domains are over or underrepresented. | |||
|
|||
## Text Analysis | |||
|
|||
![AUK Notebook screenshot](/images/AUK_Notebook_Text.png) | |||
|
|||
Text analysis is a popular way to do exploratory analysis of web archive data. The [Natural Languages Toolkit](https://www.nltk.org) (nltk) library offers an array of options here. For example, we can see which domains use which words; how words are dispersed around a collection; or the average sentiment of a collection. | |||
|
|||
## Network Analysis | |||
|
|||
![AUK Notebook screenshot](/images/AUK_Notebook_Network.png) | |||
|
|||
Archives Unleashed Cloud network derivatives already offer some solid visualisation information out of the box using our GraphPass tool. This includes node sizing (based on Degree), positioning (based on the Fruchterman Reingold algorithm) and coloring (based on walktrap modularity). But that does not mean one cannot use Python libraries like networkx to produce interesting analyses. For instance, creating an ego network of a particular node in the graph is pretty straight-forward. | |||
|
|||
## Try It Yourself! | |||
|
|||
As noted, you can currently try the notebooks via our [GitHub repository](https://github.com/archivesunleashed/auk-notebooks). You can also get a sneak peak by running it via [Binder](https://mybinder.org/v2/gh/archivesunleashed/auk-notebooks/master?filepath=auk-notebook.ipynb). |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
0 comments on commit
9711e5c