Web Archiving Tool
Switch branches/tags
Nothing to show
Clone or download
greebie First commit.
- Add README.md
- Add function to set perma.cc API key.
- Add function to check wayback url.
- Add function to check list of urls.
- Add function to parse webpage and check list of urls.
- Add small suite of unit tests.
Latest commit 7c927c3 Dec 15, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md First commit. Dec 15, 2018
archivr.R First commit. Dec 15, 2018
archivr_test.R First commit. Dec 15, 2018
run_tests.R First commit. Dec 15, 2018

README.md

Archivr

Archivr is a project by the Qualitative Data Repository that verifies the preservation of urls in Web Archives.

Basic usage (for now):

git clone 'https://github.com/QualitativeDataRepository/archivr.git'
cd archivr

Then launch R and then:

source('archivr.R')
archiv(list("www.example.com"))

Examples:

The basic function is archiv that takes a list of urls and checks their availability on the wayback machine. It will return a dataframe with the original urls followed by the http status (or 000 if no url exists), their availability (TRUE or FALSE), the wayback machine url, and a timestamp.

arc_df <- archiv(list("www.example.com", "NOTAURL", "www.github.com"))
arc_df$status # [1] 200 000 200 / Levels: 000 200
arc_df$wayback_url # [1] http://web.archive.org/web/20181214234252/http://Www.example.com
              # [2] url not found                                                   
              # [3] http://web.archive.org/web/20181215081640/https://github.com/   
              # 3 Levels: http://web.archive.org/web/20181214234252/http://Www.example.com ...

Archiv can also check a webpage for archived urls.

arc_url_df <- archiv.fromUrl("https://qdr.syr.edu/")
df <- data.frame(arc_url_df$url, arc_url_df$wayback_url)[8,]

#   arc_url_df.url                                    arc_url_df.wayback_url
# 8 http://syr.edu http://web.archive.org/web/20170110050058/http://syr.edu/

Future implementations will include checks in perma.cc (or both). Archiv will also become an R package.

TESTING

Archivr has a few unit tests that can be run for contributors. To run, use r -f run_tests.R inside the archivr folder.

Archivr was developed by Ryan Deschamps @greebie