New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .getHttpStatus and .getFilename to ArchiveRecordImpl class #198 & #164 #292

Merged
merged 6 commits into from Nov 28, 2018

Conversation

Projects
None yet
4 participants
@greebie
Contributor

greebie commented Nov 22, 2018

GitHub issue(s):

#198, #164

What does this Pull Request do?

This PR includes three changes to the ArchiveRecord class:

  1. The Status Code Header for the crawl (via .getHttpStatus)
  2. The read-streamed filename (via .getFilename)
  3. Explicit testing of the ArchiveRecord class (this will likely not change coverage by much if at all).

.getHttpStatus could be useful for studies that would like to compare status calls on crawls.

Example:

  import io.archivesunleashed._
  import io.archivesunleashed.matchbox._
  import io.archivesunleashed.util._

  val links = RecordLoader.loadArchives("/Users/ryandeschamps/warcs/*gz", sc)
    .keepValidPages()
    .keepContent(Set("apple".r))
    .map(r => (r.getHttpStatus, (ExtractLinks(r.getUrl, r.getContentString))))
    .flatMap(r => r._2.map(f => (r._1, ExtractDomain(f._1).replaceAll("^\\s*www\\.", ""), 
      ExtractDomain(f._2).replaceAll("^\\s*www\\.", ""))))
    .filter(r => r._2 != "" && r._3 != "")
    .countItems()
    .filter(r => r._2 > 5).take(10)

Should produce something like

Array(((200,nanaimodailynews.com,nanaimodailynews.com),445785), ((200,nanaimodailynews.com,blackpress.ca),188676), ((200,nanaimodailynews.com,bclocalnews.com),111400), ... )

It might be more interesting to test this script without .keepValidPages() since most valid pages would likely return a 200. Where an ArchiveRecord fails to get a status code (via null or empty string) the record will return 000. This default value can be changed.

.getFilename returns the fullpath (which may be a url) of the Warc that was consumed.

  import io.archivesunleashed._
  import io.archivesunleashed.matchbox._
  import io.archivesunleashed.util._
  import org.apache.commons.io.FilenameUtils

  val links = RecordLoader.loadArchives("/Users/ryandeschamps/warcs/*gz", sc)
        .keepValidPages()
        .keepContent(Set("apple".r))
        .map(r => (r.getFilename, (ExtractLinks(r.getUrl, r.getContentString))))
        .flatMap(r => r._2.map(f => (r._1, ExtractDomain(f._1).replaceAll("^\\s*www\\.", ""), 
          ExtractDomain(f._2).replaceAll("^\\s*www\\.", ""))))
        .filter(r => r._2 != "" && r._3 != "")
        .countItems()
        .filter(r => r._2 > 5).take(10)

should produce something like

links: Array[((String, String, String), Int)] = Array(((file:/Users/ryandeschamps/warcs/ARCHIVEIT-4656-CRAWL_SELECTED_SEEDS-JOB193391-20160127222913427-00000.warc.gz,nanaimodailynews.com,nanaimodailynews.com),439503), 
((file:/Users/ryandeschamps/warcs/ARCHIVEIT-4656-CRAWL_SELECTED_SEEDS-JOB193391-20160127222913427-00000.warc.gz,nanaimodailynews.com,blackpress.ca),186028), 
((file:/Users/ryandeschamps/warcs/ARCHIVEIT-4656-CRAWL_SELECTED_SEEDS-JOB193391-20160127222913427-00000.warc.gz,nanaimodailynews.com,bclocalnews.com),106107), 
((file:/Users/ryandeschamps/warcs/ARCHIVEIT-4656-CRAWL_SELECTED_SEEDS-JOB193391-20160127222913427-00000.warc.gz,nanaimodailynews.com,drivewaycanada.ca),53040),
...

To get just the filename, you could use FilenameUtils.getName(x.getFilename) (I have included the FilenameUtils import in the code above). I have not looked into whether FilenameUtils will get the filename from a url, but that is why I stuck with the fullpath. Also, I am open to feedback on whether .getFilename is the right call for this. IIPC calls it .getReaderIdentifier().

I can think of a number of use cases for this, but it would be great as a way to detect problems with warcs (e.g. with wonky data). It might also be helpful for error responses down the road.

How should this be tested?

The above scripts should have expected outputs.
Both functions should work for both arc and warc formats.
Travis should pass.

Additional Notes:

I used some of @dportabella 's ideas to produce the code here, in particular for getting the HttpStatus from a Warc file. For Arc and the Warc filename, I tried to use the WARCRecord and ARCRecord classes instead.

I did not include the full header responses in this PR. It seems like the ArcRecord and WarcRecord responses are quite different and I haven't been able to produce an effective testing mechanism.

Interested parties

@ianmilligan1 @ruebot

Thanks in advance for your help with the Archives Unleashed Toolkit!

greebie added some commits Nov 21, 2018

Add httpStatus to ArchiveRecord class & trait
- add .httpStatus to potential outputs
- add tests for .httpStatus calls
- improve ArchiveRecord testing overall.
Add .fileName feature to ArchiveRecordImpl.
- add filename to trait
- add filename for ArchiveRecordImpl
- add tests for filename.
@codecov-io

This comment has been minimized.

codecov-io commented Nov 22, 2018

Codecov Report

Merging #292 into master will increase coverage by 0.06%.
The diff coverage is 81.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #292      +/-   ##
==========================================
+ Coverage   73.33%   73.39%   +0.06%     
==========================================
  Files          42       42              
  Lines        1170     1184      +14     
  Branches      205      210       +5     
==========================================
+ Hits          858      869      +11     
  Misses        244      244              
- Partials       68       71       +3
Impacted Files Coverage Δ
...ain/scala/io/archivesunleashed/ArchiveRecord.scala 82.14% <81.25%> (-1.2%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80b9e2b...b722d1d. Read the comment docs.

@ianmilligan1

This comment has been minimized.

Member

ianmilligan1 commented Nov 22, 2018

Thanks @greebie – I can test. Is this PR complete or is there anything else you're hoping to add to it before we start the review process?

@greebie

This comment has been minimized.

Contributor

greebie commented Nov 22, 2018

Thanks. I'm pretty sure I learned from the last PR and did a very good job testing. fingers crossed I think it's ready to go.

@ianmilligan1

This comment has been minimized.

Member

ianmilligan1 commented Nov 22, 2018

OK sg. I'll take first round testing, realistically probably tomorrow morning.

@@ -37,17 +38,76 @@ class ArchiveRecordTest extends FunSuite with BeforeAndAfter {
.setAppName(appName)
conf.set("spark.driver.allowMultipleContexts", "true");
sc = new SparkContext(conf)

This comment has been minimized.

@ruebot

ruebot Nov 23, 2018

Member

Remove blank line.

/** Trait for a record in a web archive. */
trait ArchiveRecord extends Serializable {
/** Returns the filename containing the Archive Records */
def getFilename: String

This comment has been minimized.

@ruebot

ruebot Nov 23, 2018

Member

Let's use Resourcename instead of Filename.

This comment has been minimized.

@greebie

greebie Nov 23, 2018

Contributor

Done. Just to confirm Resourcename (to mimic Filename) and not ResourceName (because resource name are two separate words)? I'm okay with either.

This comment has been minimized.

@greebie

greebie Nov 23, 2018

Contributor

Current commit uses Resourcename. I think that's fine - I'm probably just overthinking it.

This comment has been minimized.

@ianmilligan1

ianmilligan1 Nov 23, 2018

Member

I think Resourcename works FWIW

This comment has been minimized.

@ruebot

ruebot Nov 24, 2018

Member

Resourcename is fine.

/** Trait for a record in a web archive. */
trait ArchiveRecord extends Serializable {
/** Returns the filename containing the Archive Records */

This comment has been minimized.

@ruebot

ruebot Nov 23, 2018

Member

Missing fullstop.

@@ -64,6 +74,7 @@ class ArchiveRecordImpl(r: SerializableWritable[ArchiveRecordWritable]) extends
var arcRecord: ARCRecord = null
var warcRecord: WARCRecord = null
// scalastyle:on null
var headerResponseFormat: String = "US-ASCII"

This comment has been minimized.

@ruebot

ruebot Nov 23, 2018

Member

What's the rationale for US-ASCII?

This comment has been minimized.

@ruebot

ruebot Nov 23, 2018

Member
   Historically, HTTP has allowed field content with text in the
   ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
   through use of [RFC2047] encoding.  In practice, most HTTP header
   field values use only a subset of the US-ASCII charset [USASCII].
   Newly defined header fields SHOULD limit their field values to
   US-ASCII octets.  A recipient SHOULD treat other octets in field
   content (obs-text) as opaque data.

I suppose it this? https://tools.ietf.org/html/rfc7230#section-3.2.4

This comment has been minimized.

@greebie

greebie Nov 23, 2018

Contributor

Yes - this part was a mimic of @dportabella 's suggestion. Seemed weird to me at first as well.

@ianmilligan1

ianmilligan1 approved these changes Nov 23, 2018 edited

Tested with variations on

import io.archivesunleashed._
import io.archivesunleashed.matchbox._

RecordLoader.loadArchives("/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/*.gz", sc)
  .keepValidPages()
  .map(r => (r.getFilename, r.getHttpStatus, r.getCrawlDate, r.getDomain, r.getUrl, RemoveHTML(r.getContentString)))
  .saveAsTextFile("/tuna1/scratch/i2milligan/results/plain-text-http")

WARC file and response codes all check out, and work in a variety of scripts. Good to go from my POV (and @ruebot thanks for your review too!).

Updates based on PR.
- change .getFilename to .getResourcename
- Other code style fixes.
@greebie

This comment has been minimized.

Contributor

greebie commented Nov 23, 2018

I would like to run one more test on this minus the .keepValidRecords(). Seems like there's potential for that to break.

@ianmilligan1

This comment has been minimized.

Member

ianmilligan1 commented Nov 23, 2018

So just i.e.

import io.archivesunleashed._
import io.archivesunleashed.matchbox._

RecordLoader.loadArchives("/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/*.gz", sc)
  .map(r => (r.getFilename, r.getHttpStatus, r.getCrawlDate, r.getDomain, r.getUrl, RemoveHTML(r.getContentString)))
  .saveAsTextFile("/tuna1/scratch/i2milligan/results/plain-text-http")

I have a large body of WARCs to test on, so if that's what you mean I can test it here too @greebie

@greebie

This comment has been minimized.

Contributor

greebie commented Nov 23, 2018

Yes. Would like to see some statuses besides 200. :) Also if the header is munged, I'd like to be able to cover for it. Really appreciate it if you could.

@ianmilligan1

This comment has been minimized.

Member

ianmilligan1 commented Nov 23, 2018

OK. Let me run it at scale minus the plain text, so we can easily look around at status codes too.

@ianmilligan1

This comment has been minimized.

Member

ianmilligan1 commented Nov 23, 2018

Tested it without .keepValidPages() and all seems to work.

Am seeing other response codes - i.e. 404s, 301s, etc. i.e.

(file:/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/ARCHIVEIT-5421-MONTHLY-3224-20150514172923428-00000-wbgrp-crawl064.us.archive.org-6442.warc.gz,301,20150514,www.ontario.ca,http://www.ontario.ca/dgr)
(file:/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/ARCHIVEIT-5421-MONTHLY-3224-20150514172923428-00000-wbgrp-crawl064.us.archive.org-6442.warc.gz,200,20150514,www1.toronto.ca,http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=68a618b06a1aa410VgnVCM10000071d60f89RCRD)
(file:/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/ARCHIVEIT-5421-MONTHLY-3224-20150514172923428-00000-wbgrp-crawl064.us.archive.org-6442.warc.gz,200,20150514,www.ontario.ca,http://www.ontario.ca/travel-and-recreation/about-ontario?utm_source=shortlinks&utm_medium=web&utm_campaign=dgr)
(file:/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/ARCHIVEIT-5421-MONTHLY-3224-20150514172923428-00000-wbgrp-crawl064.us.archive.org-6442.warc.gz,404,20150514,www.ontario.ca,http://www.ontario.ca/travel-and-recreation/Msxml2.XMLHTTP)
@greebie

This comment has been minimized.

Contributor

greebie commented Nov 23, 2018

Awesome! I think doing a status code visualisation would make a decent Medium post. That might be my december project.

@greebie

This comment has been minimized.

Contributor

greebie commented Nov 23, 2018

More relevant to #260 but related to why coverage shrunk in this PR - I think we need test cases for ArchiveRecords that are neither WARC or ARC Format. The java takes care of the error handling I think, but codecov notices that we did not test for that here in ArchiveRecord for some reason. I think this is something for a different PR, however.

@ruebot

This comment has been minimized.

Member

ruebot commented Nov 27, 2018

I think we have different interpretations of what Resource name is. This appears to be putting out the filename of the ARC/WARC. My understanding was that it was the resource name of the resource being parsed. i.e. foo.jpg or index.html. If we're resolving #164 with this (my bad, I should I have read that more closely when first reviewing), then it should not be Resourcename. It should be ArchiveFilename or something similar that is more explicit what it is.

@ruebot

This comment has been minimized.

Member

ruebot commented Nov 27, 2018

We also need to decide if it is just the ARC/WARC filename name itself that the method returns, or the full path.

assert(textSampleWarc.deep == Array("20080430", "20080430", "20080430").deep)
}
test("Domains") {

This comment has been minimized.

@ruebot

ruebot Nov 27, 2018

Member

There are a few extra test methods here. Are they scope for the issues posted in the original comment? Or do they cover other tickets?

This comment has been minimized.

@greebie

greebie Nov 27, 2018

Contributor

No other tickets. Basically, last PR I had good test coverage, but failed to test particular cases and my code passed Travis with bugs. I decided to include these additional tests in case that happened again (it was unlikely, but I wanted this PR to go more smoothly). Since ArchiveRecord is used widely across the tests, I did not expect the tests to improve coverage (as per #260).

@ruebot

This comment has been minimized.

Member

ruebot commented Nov 27, 2018

Other than the above comments, functionality-wise, looks good to me:

import io.archivesunleashed._
import io.archivesunleashed.matchbox._

RecordLoader.loadArchives("/home/nruest/tmp/test-warcs/5467/*.gz", sc)
  .map(r => (r.getResourcename, r.getHttpStatus))
  .saveAsTextFile("/home/nruest/tmp/5467_output")
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,204)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,403)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)

Nice work @greebie 😄

@greebie

This comment has been minimized.

Contributor

greebie commented Nov 27, 2018

I think we have different interpretations of what Resource name is. This appears to be putting out the filename of the ARC/WARC. My understanding was that it was the resource name of the resource being parsed. i.e. foo.jpg or index.html. If we're resolving #164 with this (my bad, I should I have read that more closely when first reviewing), then it should not be Resourcename. It should be ArchiveFilename or something similar that is more explicit what it is.

I think archiveFilename works. The issue is that because the filename comes from the ArcRecord and WarcRecord classes it technically could be a url. IIPC calls it "getReaderIdentifier"

@greebie

This comment has been minimized.

Contributor

greebie commented Nov 27, 2018

We also need to decide if it is just the ARC/WARC filename name itself that the method returns, or the full path.

I suggest we keep the fullpaths but provide information accessing the filename in the docs. There's an example to extract just the filename in the tests. Another option is to include a util.

@ruebot

This comment has been minimized.

Member

ruebot commented Nov 27, 2018

Ok, let's go with archiveFilename, and update doc comments as necessary too.

@ruebot

This comment has been minimized.

Member

ruebot commented Nov 27, 2018

I suggest we keep the fullpaths but provide information accessing the filename in the docs. There's an example to extract just the filename in the tests.

Ok. Update this ticket as necessary after this gets merged.

Change .getResourcename to .getArchiveFile
- include changes to tests.
@ruebot

This comment has been minimized.

Member

ruebot commented Nov 28, 2018

Good to go!

Script:

import io.archivesunleashed._
import io.archivesunleashed.matchbox._
import org.apache.commons.io.FilenameUtils

RecordLoader.loadArchives("/home/nruest/tmp/test-warcs/5467/*.gz", sc)
  .map(r => (r.getArchiveFilename, r.getHttpStatus, FilenameUtils.getName(r.getArchiveFilename)))
  .saveAsTextFile("/home/nruest/tmp/292_final_test")

Sample output:

(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,200,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,000,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,000,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,204,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,403,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,200,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,000,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,200,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)  
@ruebot

ruebot approved these changes Nov 28, 2018

@ruebot ruebot merged commit 7731b6d into master Nov 28, 2018

3 checks passed

codecov/patch 81.25% of diff hit (target 73.33%)
Details
codecov/project 73.39% (+0.06%) compared to 80b9e2b
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ruebot ruebot deleted the issue-198 branch Nov 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment