Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd save to wayback function(s). (Issue #2) #3
Conversation
greebie
requested a review
from
adam3smith
Jan 11, 2019
greebie
referenced this pull request
Jan 15, 2019
Closed
Set up saving urls from a webpage or markdown #4
adam3smith
approved these changes
Jan 15, 2019
Saving to WBM works flawlessly. Also tested archiving a page with NOARCHIV in robots and script returned appropriate 403 message |
This comment has been minimized.
This comment has been minimized.
Still testing saving urls from files |
This comment has been minimized.
This comment has been minimized.
Fingers crossed I am starting issue #5 now. Getting there! |
adam3smith
reviewed
Jan 15, 2019
archivr.R Outdated
greebie
added some commits
Jan 17, 2019
This comment has been minimized.
This comment has been minimized.
I included the relaxed regex, but it seems that I get false negatives in docx (haven't tried with others). For example, from my syllabus, I got "http://gutenberg.ca/ebooks/innis-minerva/innis-minerva-00-h.html.Turner" suggesting that somewhere along the line the script is not detecting \n etc. |
adam3smith
reviewed
Jan 17, 2019
@@ -220,7 +308,7 @@ set_api_key <- function (key) { | |||
#' | |||
#' @param url The url to extract urls. | |||
#' @return a vector of urls. | |||
get_urls_from_webpage <- function (url) { | |||
extract_urls_from_webpage <- function (url) { |
This comment has been minimized.
This comment has been minimized.
adam3smith
Jan 17, 2019
Contributor
this function doesn't extract internal/relative links (because of the startsWith
command). I think generally thats the right choice. We should
a) either document it (I think this is probably the better option) or
b) add an option to this that it would inherit from archiv.fromURL
What do you think?
This comment has been minimized.
This comment has been minimized.
greebie
Jan 17, 2019
Collaborator
I think documenting would be better for now. If all things work properly, I think fixing the README will be part of the PR that turns this into a package. (Packaging will require some serious refactoring of the code).
adam3smith
reviewed
Jan 17, 2019
archivr.R Outdated
adam3smith
reviewed
Jan 17, 2019
archivr.R Outdated
greebie
added some commits
Jan 17, 2019
This comment has been minimized.
This comment has been minimized.
@greebie -- I think we're good to merge this PR and turn this into a package. I'll let you handle the merge. Thanks! |
greebie commentedJan 11, 2019
This PR adds save to wayback functionality.
It can do a single url
or it can save a list of urls: