Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upParse xml and htm[l]? documents #31
Conversation
This comment has been minimized.
This comment has been minimized.
Closes #12 |
This comment has been minimized.
This comment has been minimized.
@greebie if you'd have time to review this? If code is OK, I'd add to docs and then I think we have this in good enough shape to start promoting, even if we aren't quite ready for CRAN yet. |
Just a minor issue that broke Travis and Appveyor (see review). |
Add except to examples and include in Readme
Update examples & man files. from_wayback and from_perma now both return equivalent lists. view_archiv now sets available to true if either has a copy. I've removed the "status" column/variable, since it's just the status of the API call, not that of the URL, so doesn't add anything. I've also made all examples assign function results to a new variable rather than print them to declutter CI (and model good practice)
@@ -259,7 +260,7 @@ view_archiv <- function (lst, method="wayback") { | |||
#' @return a dataframe containing the url, status, availability, | |||
#' archived url(s) and timestamp(s) | |||
#' @examples | |||
#' view_archiv.fromUrl("https://www-cs-faculty.stanford.edu/~knuth/retd.html", method="both") | |||
#' checkArchiveStatus <- view_archiv.fromUrl("https://www-cs-faculty.stanford.edu/~knuth/retd.html", method="both") |
This comment has been minimized.
This comment has been minimized.
greebie
Nov 7, 2019
Collaborator
Checkreview complained about the example line being over 100 characters. I suggest:
#' checkArchiveStatus <- view_archiv.fromUrl(
#' "https (..)",
#' method="both")
@@ -273,7 +274,7 @@ view_archiv.fromUrl <- function (url, method="wayback") { | |||
#' archived url(s) and timestamp(s) | |||
#' @examples | |||
#' \dontrun{\ | |||
#' view_archiv.fromText("testfile.docx", method="both") | |||
#' checkArchiveStatus <- view_archiv.fromText("testfile.docx", method="both") |
This comment has been minimized.
This comment has been minimized.
#' @export | ||
#' @return a dataframe containing the url, status, availability, | ||
#' archived url(s) and timestamp(s) | ||
#' @examples | ||
#' # Wayback | ||
#' archiv.fromUrl("https://www-cs-faculty.stanford.edu/~knuth/retd.html") | ||
#' archivedURLs <- archiv.fromUrl("https://www-cs-faculty.stanford.edu/~knuth/retd.html", except="validator\\.w3\\.org") |
This comment has been minimized.
This comment has been minimized.
archiv.fromUrl <- function (url, method="wayback") { | ||
return(archiv(extract_urls_from_webpage(url), method)) | ||
archiv.fromUrl <- function (url, method="wayback", except = NULL) { | ||
return(archiv(extract_urls_from_webpage(url, except), method)) |
This comment has been minimized.
This comment has been minimized.
#' @examples | ||
#' from_wayback("https://www-cs-faculty.stanford.edu/~knuth/retd.html") | ||
#' checkStatus <- from_wayback("https://www-cs-faculty.stanford.edu/~knuth/retd.html") |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
adam3smith
Nov 7, 2019
Author
Contributor
Will fix, thanks, didn't see these comments before the latest commit.
And remove it from more documentation
This comment has been minimized.
This comment has been minimized.
Travis did not work this time. Seems like a problem with from_wayback?
|
This comment has been minimized.
This comment has been minimized.
@greebie -- we're warning free, thanks! Let me know if this looks go to merge from your end |
adam3smith commentedOct 17, 2019
Closes #30
ideas for doing this more elegantly welcome, of course