Add .getHttpStatus and .getFilename to ArchiveRecordImpl class #198 & #164 #292

greebie · Nov 22, 2018

GitHub issue(s):

What does this Pull Request do?

This PR includes three changes to the ArchiveRecord class:

The Status Code Header for the crawl (via .getHttpStatus)
The read-streamed filename (via .getFilename)
Explicit testing of the ArchiveRecord class (this will likely not change coverage by much if at all).

.getHttpStatus could be useful for studies that would like to compare status calls on crawls.

Example:

  import io.archivesunleashed._
  import io.archivesunleashed.matchbox._
  import io.archivesunleashed.util._

  val links = RecordLoader.loadArchives("/Users/ryandeschamps/warcs/*gz", sc)
    .keepValidPages()
    .keepContent(Set("apple".r))
    .map(r => (r.getHttpStatus, (ExtractLinks(r.getUrl, r.getContentString))))
    .flatMap(r => r._2.map(f => (r._1, ExtractDomain(f._1).replaceAll("^\\s*www\\.", ""), 
      ExtractDomain(f._2).replaceAll("^\\s*www\\.", ""))))
    .filter(r => r._2 != "" && r._3 != "")
    .countItems()
    .filter(r => r._2 > 5).take(10)

Should produce something like

Array(((200,nanaimodailynews.com,nanaimodailynews.com),445785), ((200,nanaimodailynews.com,blackpress.ca),188676), ((200,nanaimodailynews.com,bclocalnews.com),111400), ... )

It might be more interesting to test this script without .keepValidPages() since most valid pages would likely return a 200. Where an ArchiveRecord fails to get a status code (via null or empty string) the record will return 000. This default value can be changed.

.getFilename returns the fullpath (which may be a url) of the Warc that was consumed.

  import io.archivesunleashed._
  import io.archivesunleashed.matchbox._
  import io.archivesunleashed.util._
  import org.apache.commons.io.FilenameUtils

  val links = RecordLoader.loadArchives("/Users/ryandeschamps/warcs/*gz", sc)
        .keepValidPages()
        .keepContent(Set("apple".r))
        .map(r => (r.getFilename, (ExtractLinks(r.getUrl, r.getContentString))))
        .flatMap(r => r._2.map(f => (r._1, ExtractDomain(f._1).replaceAll("^\\s*www\\.", ""), 
          ExtractDomain(f._2).replaceAll("^\\s*www\\.", ""))))
        .filter(r => r._2 != "" && r._3 != "")
        .countItems()
        .filter(r => r._2 > 5).take(10)

should produce something like

links: Array[((String, String, String), Int)] = Array(((file:/Users/ryandeschamps/warcs/ARCHIVEIT-4656-CRAWL_SELECTED_SEEDS-JOB193391-20160127222913427-00000.warc.gz,nanaimodailynews.com,nanaimodailynews.com),439503), 
((file:/Users/ryandeschamps/warcs/ARCHIVEIT-4656-CRAWL_SELECTED_SEEDS-JOB193391-20160127222913427-00000.warc.gz,nanaimodailynews.com,blackpress.ca),186028), 
((file:/Users/ryandeschamps/warcs/ARCHIVEIT-4656-CRAWL_SELECTED_SEEDS-JOB193391-20160127222913427-00000.warc.gz,nanaimodailynews.com,bclocalnews.com),106107), 
((file:/Users/ryandeschamps/warcs/ARCHIVEIT-4656-CRAWL_SELECTED_SEEDS-JOB193391-20160127222913427-00000.warc.gz,nanaimodailynews.com,drivewaycanada.ca),53040),
...

To get just the filename, you could use FilenameUtils.getName(x.getFilename) (I have included the FilenameUtils import in the code above). I have not looked into whether FilenameUtils will get the filename from a url, but that is why I stuck with the fullpath. Also, I am open to feedback on whether .getFilename is the right call for this. IIPC calls it .getReaderIdentifier().

I can think of a number of use cases for this, but it would be great as a way to detect problems with warcs (e.g. with wonky data). It might also be helpful for error responses down the road.

How should this be tested?

The above scripts should have expected outputs.
Both functions should work for both arc and warc formats.
Travis should pass.

Additional Notes:

I used some of @dportabella 's ideas to produce the code here, in particular for getting the HttpStatus from a Warc file. For Arc and the Warc filename, I tried to use the WARCRecord and ARCRecord classes instead.

I did not include the full header responses in this PR. It seems like the ArcRecord and WarcRecord responses are quite different and I haven't been able to produce an effective testing mechanism.

Interested parties

@ianmilligan1 @ruebot

Thanks in advance for your help with the Archives Unleashed Toolkit!

codecov-io · Nov 22, 2018

Codecov Report

Merging #292 into master will increase coverage by 0.06%.
The diff coverage is 81.25%.

@@            Coverage Diff             @@
##           master     #292      +/-   ##
==========================================
+ Coverage   73.33%   73.39%   +0.06%     
==========================================
  Files          42       42              
  Lines        1170     1184      +14     
  Branches      205      210       +5     
==========================================
+ Hits          858      869      +11     
  Misses        244      244              
- Partials       68       71       +3

Impacted Files	Coverage Δ
...ain/scala/io/archivesunleashed/ArchiveRecord.scala	`82.14% <81.25%> (-1.2%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80b9e2b...b722d1d. Read the comment docs.

ianmilligan1 · Nov 22, 2018

Thanks @greebie – I can test. Is this PR complete or is there anything else you're hoping to add to it before we start the review process?

greebie · Nov 22, 2018

Thanks. I'm pretty sure I learned from the last PR and did a very good job testing. fingers crossed I think it's ready to go.

ianmilligan1 · Nov 22, 2018

OK sg. I'll take first round testing, realistically probably tomorrow morning.

ruebot · Nov 23, 2018

ruebot requested changes Nov 23, 2018

View changes

ruebot · Nov 23, 2018

src/test/scala/io/archivesunleashed/ArchiveRecordTest.scala

@@ -37,17 +38,76 @@ class ArchiveRecordTest extends FunSuite with BeforeAndAfter {
      .setAppName(appName)
    conf.set("spark.driver.allowMultipleContexts", "true");
    sc = new SparkContext(conf)
+


Remove blank line.

ruebot · Nov 23, 2018

src/main/scala/io/archivesunleashed/ArchiveRecord.scala


 /** Trait for a record in a web archive. */
 trait ArchiveRecord extends Serializable {
+  /** Returns the filename containing the Archive Records */
+  def getFilename: String


Let's use Resourcename instead of Filename.

Done. Just to confirm Resourcename (to mimic Filename) and not ResourceName (because resource name are two separate words)? I'm okay with either.

Current commit uses Resourcename. I think that's fine - I'm probably just overthinking it.

I think Resourcename works FWIW

Resourcename is fine.

ruebot · Nov 23, 2018

src/main/scala/io/archivesunleashed/ArchiveRecord.scala


 /** Trait for a record in a web archive. */
 trait ArchiveRecord extends Serializable {
+  /** Returns the filename containing the Archive Records */


Missing fullstop.

ruebot · Nov 23, 2018

src/main/scala/io/archivesunleashed/ArchiveRecord.scala

@@ -64,6 +74,7 @@ class ArchiveRecordImpl(r: SerializableWritable[ArchiveRecordWritable]) extends
  var arcRecord: ARCRecord = null
  var warcRecord: WARCRecord = null
  // scalastyle:on null
+  var headerResponseFormat: String = "US-ASCII"


What's the rationale for US-ASCII?

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

I suppose it this? https://tools.ietf.org/html/rfc7230#section-3.2.4

Yes - this part was a mimic of @dportabella 's suggestion. Seemed weird to me at first as well.

ianmilligan1 · Nov 23, 2018

ianmilligan1 approved these changes Nov 23, 2018 • edited

View changes

Tested with variations on

import io.archivesunleashed._
import io.archivesunleashed.matchbox._

RecordLoader.loadArchives("/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/*.gz", sc)
  .keepValidPages()
  .map(r => (r.getFilename, r.getHttpStatus, r.getCrawlDate, r.getDomain, r.getUrl, RemoveHTML(r.getContentString)))
  .saveAsTextFile("/tuna1/scratch/i2milligan/results/plain-text-http")

WARC file and response codes all check out, and work in a variety of scripts. Good to go from my POV (and @ruebot thanks for your review too!).

greebie · Nov 23, 2018

I would like to run one more test on this minus the .keepValidRecords(). Seems like there's potential for that to break.

ianmilligan1 · Nov 23, 2018

So just i.e.

import io.archivesunleashed._
import io.archivesunleashed.matchbox._

RecordLoader.loadArchives("/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/*.gz", sc)
  .map(r => (r.getFilename, r.getHttpStatus, r.getCrawlDate, r.getDomain, r.getUrl, RemoveHTML(r.getContentString)))
  .saveAsTextFile("/tuna1/scratch/i2milligan/results/plain-text-http")

I have a large body of WARCs to test on, so if that's what you mean I can test it here too @greebie

greebie · Nov 23, 2018

Yes. Would like to see some statuses besides 200. :) Also if the header is munged, I'd like to be able to cover for it. Really appreciate it if you could.

ianmilligan1 · Nov 23, 2018

OK. Let me run it at scale minus the plain text, so we can easily look around at status codes too.

ianmilligan1 · Nov 23, 2018

Tested it without .keepValidPages() and all seems to work.

Am seeing other response codes - i.e. 404s, 301s, etc. i.e.

(file:/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/ARCHIVEIT-5421-MONTHLY-3224-20150514172923428-00000-wbgrp-crawl064.us.archive.org-6442.warc.gz,301,20150514,www.ontario.ca,http://www.ontario.ca/dgr)
(file:/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/ARCHIVEIT-5421-MONTHLY-3224-20150514172923428-00000-wbgrp-crawl064.us.archive.org-6442.warc.gz,200,20150514,www1.toronto.ca,http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=68a618b06a1aa410VgnVCM10000071d60f89RCRD)
(file:/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/ARCHIVEIT-5421-MONTHLY-3224-20150514172923428-00000-wbgrp-crawl064.us.archive.org-6442.warc.gz,200,20150514,www.ontario.ca,http://www.ontario.ca/travel-and-recreation/about-ontario?utm_source=shortlinks&utm_medium=web&utm_campaign=dgr)
(file:/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/ARCHIVEIT-5421-MONTHLY-3224-20150514172923428-00000-wbgrp-crawl064.us.archive.org-6442.warc.gz,404,20150514,www.ontario.ca,http://www.ontario.ca/travel-and-recreation/Msxml2.XMLHTTP)

greebie · Nov 23, 2018

Awesome! I think doing a status code visualisation would make a decent Medium post. That might be my december project.

greebie · Nov 23, 2018

More relevant to #260 but related to why coverage shrunk in this PR - I think we need test cases for ArchiveRecords that are neither WARC or ARC Format. The java takes care of the error handling I think, but codecov notices that we did not test for that here in ArchiveRecord for some reason. I think this is something for a different PR, however.

ruebot · Nov 27, 2018

I think we have different interpretations of what Resource name is. This appears to be putting out the filename of the ARC/WARC. My understanding was that it was the resource name of the resource being parsed. i.e. foo.jpg or index.html. If we're resolving #164 with this (my bad, I should I have read that more closely when first reviewing), then it should not be Resourcename. It should be ArchiveFilename or something similar that is more explicit what it is.

ruebot · Nov 27, 2018

We also need to decide if it is just the ARC/WARC filename name itself that the method returns, or the full path.

ruebot · Nov 27, 2018

ruebot reviewed Nov 27, 2018

View changes

ruebot · Nov 27, 2018

src/test/scala/io/archivesunleashed/ArchiveRecordTest.scala

+    assert(textSampleWarc.deep == Array("20080430", "20080430", "20080430").deep)
+  }
+
+  test("Domains") {


There are a few extra test methods here. Are they scope for the issues posted in the original comment? Or do they cover other tickets?

No other tickets. Basically, last PR I had good test coverage, but failed to test particular cases and my code passed Travis with bugs. I decided to include these additional tests in case that happened again (it was unlikely, but I wanted this PR to go more smoothly). Since ArchiveRecord is used widely across the tests, I did not expect the tests to improve coverage (as per #260).

ruebot · Nov 27, 2018

Other than the above comments, functionality-wise, looks good to me:

import io.archivesunleashed._
import io.archivesunleashed.matchbox._

RecordLoader.loadArchives("/home/nruest/tmp/test-warcs/5467/*.gz", sc)
  .map(r => (r.getResourcename, r.getHttpStatus))
  .saveAsTextFile("/home/nruest/tmp/5467_output")

(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,204)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,403)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)

Nice work @greebie 😄

greebie · Nov 27, 2018

I think we have different interpretations of what Resource name is. This appears to be putting out the filename of the ARC/WARC. My understanding was that it was the resource name of the resource being parsed. i.e. foo.jpg or index.html. If we're resolving #164 with this (my bad, I should I have read that more closely when first reviewing), then it should not be Resourcename. It should be ArchiveFilename or something similar that is more explicit what it is.

I think archiveFilename works. The issue is that because the filename comes from the ArcRecord and WarcRecord classes it technically could be a url. IIPC calls it "getReaderIdentifier"

greebie · Nov 27, 2018

We also need to decide if it is just the ARC/WARC filename name itself that the method returns, or the full path.

I suggest we keep the fullpaths but provide information accessing the filename in the docs. There's an example to extract just the filename in the tests. Another option is to include a util.

ruebot · Nov 27, 2018

Ok, let's go with archiveFilename, and update doc comments as necessary too.

ruebot · Nov 27, 2018

I suggest we keep the fullpaths but provide information accessing the filename in the docs. There's an example to extract just the filename in the tests.

Ok. Update this ticket as necessary after this gets merged.

ruebot · Nov 28, 2018

Good to go!

Script:

import io.archivesunleashed._
import io.archivesunleashed.matchbox._
import org.apache.commons.io.FilenameUtils

RecordLoader.loadArchives("/home/nruest/tmp/test-warcs/5467/*.gz", sc)
  .map(r => (r.getArchiveFilename, r.getHttpStatus, FilenameUtils.getName(r.getArchiveFilename)))
  .saveAsTextFile("/home/nruest/tmp/292_final_test")

Sample output:

(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,200,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,000,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,000,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,204,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,403,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,200,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,000,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,200,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)

ruebot · Nov 28, 2018

ruebot approved these changes Nov 28, 2018

View changes

Updates based on PR.
- change .getFilename to .getResourcename - Other code style fixes.

Loading status checks…

739e06c

ruebot referenced this pull request Nov 23, 2018
Open
Update aut documentation for https://github.com/archivesunleashed/aut/pull/292 #74

Change .getResourcename to .getArchiveFile
- include changes to tests.

Loading status checks…

b722d1d

ruebot merged commit 7731b6d into master Nov 28, 2018
3 checks passed

3 checks passed

codecov/patch 81.25% of diff hit (target 73.33%)
Details

codecov/project 73.39% (+0.06%) compared to 80b9e2b
Details

continuous-integration/travis-ci/pr The Travis CI build passed
Details

ruebot deleted the issue-198 branch Nov 28, 2018

archivesunleashed/aut

Join GitHub today

Add .getHttpStatus and .getFilename to ArchiveRecordImpl class #198 & #164 #292

Conversation

greebie commented Nov 22, 2018 • edited by ianmilligan1

What does this Pull Request do?

How should this be tested?

Additional Notes:

Interested parties

greebie added some commits Nov 21, 2018

This comment has been minimized.

codecov-io commented Nov 22, 2018 • edited

Codecov Report

This comment has been minimized.

ianmilligan1 commented Nov 22, 2018

This comment has been minimized.

greebie commented Nov 22, 2018

This comment has been minimized.

ianmilligan1 commented Nov 22, 2018

ruebot requested changes Nov 23, 2018 View changes

This comment has been minimized.

This comment has been minimized.

ruebot Nov 23, 2018 • edited

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

ianmilligan1 approved these changes Nov 23, 2018 • edited View changes

This comment has been minimized.

greebie commented Nov 23, 2018

This comment has been minimized.

ianmilligan1 commented Nov 23, 2018

This comment has been minimized.

greebie commented Nov 23, 2018 • edited

This comment has been minimized.

ianmilligan1 commented Nov 23, 2018

This comment has been minimized.

ianmilligan1 commented Nov 23, 2018

This comment has been minimized.

greebie commented Nov 23, 2018

ruebot referenced this pull request Nov 23, 2018

This comment has been minimized.

greebie commented Nov 23, 2018

This comment has been minimized.

ruebot commented Nov 27, 2018

This comment has been minimized.

ruebot commented Nov 27, 2018

ruebot reviewed Nov 27, 2018 View changes

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

ruebot commented Nov 27, 2018

This comment has been minimized.

greebie commented Nov 27, 2018 • edited

This comment has been minimized.

greebie commented Nov 27, 2018 • edited

This comment has been minimized.

ruebot commented Nov 27, 2018

This comment has been minimized.

ruebot commented Nov 27, 2018

This comment has been minimized.

ruebot commented Nov 28, 2018

ruebot approved these changes Nov 28, 2018 View changes

Hide details View details ruebot merged commit 7731b6d into master Nov 28, 2018 3 checks passed

3 checks passed

ruebot deleted the issue-198 branch Nov 28, 2018

greebie commented Nov 22, 2018 •

edited by ianmilligan1

codecov-io commented Nov 22, 2018 •

edited

ruebot requested changes Nov 23, 2018

View changes

ruebot Nov 23, 2018 •

edited

ianmilligan1 approved these changes Nov 23, 2018 • edited

View changes

greebie commented Nov 23, 2018 •

edited

ruebot reviewed Nov 27, 2018

View changes

greebie commented Nov 27, 2018 •

edited

greebie commented Nov 27, 2018 •

edited

ruebot approved these changes Nov 28, 2018

View changes

ruebot merged commit `7731b6d` into master Nov 28, 2018
3 checks passed