New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop Archivr into a package and add auto documentation. #13

Merged
merged 31 commits into from Feb 3, 2019

Conversation

Projects
None yet
2 participants
@greebie
Copy link
Collaborator

greebie commented Jan 28, 2019

This should work.

What you need to do:

clone the repo.
DO NOT enter the folder.
On a MAC:
R CMD build archivr
This will create a tarball
then
R CMD INSTALL archivr_0.0.1.tar.gz

and this should load this into your packages.

From there you should be able to use library(archivr) in r studio or whatnot and away you go!

Once this is clear, I am finished. I am going to try to submit this to CRAN for my vanity. :)

@greebie

This comment has been minimized.

Copy link
Collaborator

greebie commented Jan 28, 2019

Oh, forgot Windows instructions:

Rcmd build --binary mynewpackage
Rcmd INSTALL mynewpackage_0.0.1.tar.gz

@adam3smith

This comment has been minimized.

Copy link
Contributor

adam3smith commented Jan 28, 2019

Tested install with devtools

library(devtools)
install_github("QualitativeDataRepository/archivr", ref="issue-5")

Install works great.

Some issues:

  1. extract_urls_from_text and extract_urls_from_webpage do not appear to be supported anymore (Error: 'extract_urls_from_text' is not an exported object from 'namespace:archivr'). Given the .Rd file for both, I think that is by mistake? In any case, it'd be great to get them back.
  2. The documentation text form archiv.fromText and archive.fromUrl incorrectly refers to "Collect information on whether links in a url are archived." (and similarly for .fromText).
  3. Most importantly, the main functions all appear to fail:

archiv.fromText("C:/Users/Sebastian/Desktop/ati-access workshop.md")
Error in matrix(unlist(newlst), nrow = length(newlst), byrow = T) :
'data' must be of a vector type, was 'NULL'

and

view_archiv.fromUrl("http://kbroman.org/pkg_primer/pages/github.html")
Error in read_html(url) : could not find function "read_html"

Some/all of this might be related to the installation method, but devtools should be the standard way to install before the package is on Cran, so it needs to work

@greebie

This comment has been minimized.

Copy link
Collaborator

greebie commented Jan 29, 2019

The latest commit just exports all the functions (also fixes some documentation details. Let's see if this works instead!

@greebie

This comment has been minimized.

Copy link
Collaborator

greebie commented Jan 29, 2019

Latest commit appears to work for me. The issue was the export - I was treating the @export similar to private and public functions in a class. Turns out you need to export everything, unless it does not work (archiv_batch currently does not work, but may be useful in future).

@greebie

This comment has been minimized.

Copy link
Collaborator

greebie commented Jan 29, 2019

I double-checked and all seems to work fine now. Sorry for the inconvenience. Cool that we can use devtools for this anytime we want. I'm going to add this to CRAN now - let's see what they say. The only potential issue I see is whether the internet calls fail gracefully. I'll check that in a new issue.

@greebie

This comment has been minimized.

Copy link
Collaborator

greebie commented Jan 29, 2019

Found a new error - the package would not accept the api errors as globals, so I need to create an environment. Will have this in a few minutes.

@adam3smith

This comment has been minimized.

Copy link
Contributor

adam3smith commented Jan 29, 2019

Currently getting this output on installing (though note that I'm on an airport connection, so heaven knows what it does to https connections from non-browser apps; I'll try this again from home). Not sure if the Rtools error is relevant, but I don't think it should be (I got the same error with the previous almost successful install).

WARNING: Rtools is required to build R packages, but is not currently installed.

Please download and install Rtools 3.5 from http://cran.r-project.org/bin/windows/Rtools/.
   checking for file 'C:\Users\Sebastian\AppData\Local\Temp\RtmpIhKx6N\remotes442c6a566f7d\QualitativeDataRepository-archivr-a6fd0d3/DESCRIPTION' ... √  checking for file 'C:\Users\Sebastian\AppData\Local\Temp\RtmpIhKx6N\remotes442c6a566f7d\QualitativeDataRepository-archivr-a6fd0d3/DESCRIPTION' (1.1s)
-  preparing 'archivr':
√  checking DESCRIPTION meta-information ... 
-  checking for LF line-endings in source and make files and shell scripts
-  checking for empty or unneeded directories
-  building 'archivr_0.0.1.tar.gz'
   
Installing package into ‘C:/Users/Sebastian/Documents/R/win-library/3.4’
(as ‘lib’ is unspecified)
* installing *source* package 'archivr' ...
** R
** preparing package for lazy loading
Warning: package 'readtext' was built under R version 3.4.4
Warning: package 'rvest' was built under R version 3.4.4
Warning: package 'stringr' was built under R version 3.4.4
Error in open.connection(con, "rb") : HTTP error 401.
Error : unable to load R code in package 'archivr'
ERROR: lazy loading failed for package 'archivr'
* removing 'C:/Users/Sebastian/Documents/R/win-library/3.4/archivr'
In R CMD INSTALL
Error: (converted from warning) running command '"C:/Users/SEBAST~1/DOCUME~1/R/R-34~1.3/bin/x64/R" CMD INSTALL -l "C:\Users\Sebastian\Documents\R\win-library\3.4" "C:/Users/SEBAST~1/AppData/Local/Temp/RtmpIhKx6N/file442c50177500/archivr_0.0.1.tar.gz"' had status 1
@greebie

This comment has been minimized.

Copy link
Collaborator

greebie commented Jan 29, 2019

Thanks Sebastian. I encountered a problem with global variables (ie. I could not use perma.cc) and went down a rabbit hole instead of doing the proper research. I will have this fixed by tonight or tomorrow morning.

@adam3smith

This comment has been minimized.

Copy link
Contributor

adam3smith commented Jan 30, 2019

thanks! Very excited to soon have this in a state where we can start to promote it. Let me know once I can go ahead & test

@greebie

This comment has been minimized.

Copy link
Collaborator

greebie commented Jan 30, 2019

Hi Sebastian,

I haven't tested everything, but it seems to be working as expected now. Sorry for the delay, R packaging is a bit weird.

I also added two new functions get_api_key() (to see the current api key) and get_folder_id() to see the current folder.

R packages do not like global vars (that's a good thing, really), so I had to create an environment. You can access the api key also by typing archiv_env$perma_cc_key.

@adam3smith

This comment has been minimized.

Copy link
Contributor

adam3smith commented Jan 31, 2019

Not quite ready yet:

install_github("QualitativeDataRepository/archivr", ref="issue-5")
library(archivr)

works smoothly. Documentation via ?archiv.fromText looks fine, too.

Alas, main functionality does not.

> webURLs <- extract_urls_from_webpage("https://www-cs-faculty.stanford.edu/~knuth/retd.html")
Error in read_html(url) : could not find function "read_html"

and

> testDocX <- view_archiv.fromText("C:/Users/Sebastian/Desktop/urltest.docx")
Error in matrix(unlist(newlst), nrow = length(newlst), byrow = T) : 
  'data' must be of a vector type, was 'NULL'

(file does exist and has URLs that previously extracted)
Finally

urlList <- extract_urls_from_text("C:/Users/Sebastian/Desktop/urltest.docx")

Does produce a variable urlList, but with length 0

@greebie

This comment has been minimized.

Copy link
Collaborator

greebie commented Jan 31, 2019

It looks like in my enthusiasm with complying with CRAN, I broke things. Took a while to figure that out. Apologies! Your things should work correctly now.

@greebie

This comment has been minimized.

Copy link
Collaborator

greebie commented Jan 31, 2019

(hoping I did not mess things up on rebasing).

@adam3smith

This comment has been minimized.

Copy link
Contributor

adam3smith commented Jan 31, 2019

sorry...

Installing package into ‘C:/Users/Sebastian/Documents/R/win-library/3.4’
(as ‘lib’ is unspecified)
* installing *source* package 'archivr' ...
** R
** preparing package for lazy loading
Error in parse(con, keep.source = FALSE, srcfile = NULL) : 
  30:1: unexpected input
29: import(tools)
30: <<
    ^
@greebie

This comment has been minimized.

Copy link
Collaborator

greebie commented Jan 31, 2019

sigh Git can be annoying sometimes. Should be fixed now.

@adam3smith

This comment has been minimized.

Copy link
Contributor

adam3smith commented Jan 31, 2019

you did say rebase -- of course that was going to jinx git ;)

Almost there. Most functions seem to be working, but saving to perma_cc doesn't:

> folders <- get_folder_ids()
> View(folders)
> set_folder_id("53531")
> get_api_key()
[1] "<correct API key>"
> get_folder_id()
[1] "53531"
>
permac <- archiv.fromUrl("https://www-cs-faculty.stanford.edu/~knuth/retd.html", method="perma_cc")
Error in names(x) <- value : 
  'names' attribute [6] must be the same length as the vector [1]
> testDocX <- archiv.fromText("C:/Users/Sebastian/Desktop/urltest.docx", method = "perma_cc")
Error in names(x) <- value : 
  'names' attribute [6] must be the same length as the vector [1]

(so those last two calls are broken; saving to Wayback works as do other perma_cc functions like get_folder_id(s)

Fix issues with losing api key.
Accept numeric or string values for folder.
@greebie

This comment has been minimized.

Copy link
Collaborator

greebie commented Jan 31, 2019

Sorry for all the review work here, Sebastian. I've fixed the above mentioned problem (and fixed a potential bug when people use an integer instead of string for set_folder_id()

@adam3smith

This comment has been minimized.

Copy link
Contributor

adam3smith commented Jan 31, 2019

Great. We're essentially there. If I try saving to perma.cc without setting a folder I get this

> permac <- archiv.fromUrl("https://www-cs-faculty.stanford.edu/~knuth/retd.html", method="perma_cc")
Error in if (!(reply$url == "Not a valid URL.")) { : 
  argument is of length zero

I think I'd actually prefer it to fail gracefully to picking a default folder, but either would be OK -- the error message here, while nice classic R -- isn't very helpful.

@adam3smith

This comment has been minimized.

Copy link
Contributor

adam3smith commented Feb 1, 2019

Thanks, not quite there yet, though: before setting a folder id:

> permac <- archiv.fromUrl("https://www-cs-faculty.stanford.edu/~knuth/retd.html", method="perma_cc")
[1] "Received an error reply, likely because your limit has been exceeded."
[1] "Received an error reply, likely because your limit has been exceeded."
[1] "Received an error reply, likely because your limit has been exceeded."
[1] "Received an error reply, likely because your limit has been exceeded."
Error in names(x) <- value : 
  'names' attribute [6] must be the same length as the vector [1]

Similar issue on getting folder ids before setting an API key:

> folders <- get_folder_ids()
Error in open.connection(con, "rb") : HTTP error 401.

Should instead throw the same error as other functions, i.e. print("Please input your api key:\nUse 'set_api_key(API_KEY)'")

edit: listing these in reverse order. The 2nd one does work once an API key is set. The first one does work once a folder ID is also set.

@greebie

This comment has been minimized.

Copy link
Collaborator

greebie commented Feb 1, 2019

I printed out the api results for the api key and it returns ("Perma.cc cannot presently make additional Perma Links on your behalf. Visit your subscription settings page for more information.") when I run it. I'm guessing that there's an issue with sending links to (Personal Links) on the QDR account, which is what gets selected when I set the default (it works fine on my own account).

@adam3smith

This comment has been minimized.

Copy link
Contributor

adam3smith commented Feb 1, 2019

OK, we can leave that alone then. Not sure why that throws an error for Persona. Fixing the get_folder_ids() error message would be good, though.

@greebie

This comment has been minimized.

Copy link
Collaborator

greebie commented Feb 1, 2019

Yeah - I'm willing to explore further down the road if your clients are being driven crazy by it. Messaging on get_folder_ids() is now fixed.

@adam3smith

This comment has been minimized.

Copy link
Contributor

adam3smith commented Feb 2, 2019

I'll merge this though I'm still not seeing the error messages for missing API keys, I think because

> key <- ""
> a <- is.null(key)
> a
[1] FALSE

since an empty string isn't null because... R, but I think I can take it from here. If you could give me a couple of pointers, though, so I can at least help maintain:

  1. What's a good way to test/develop the code? I've never done package development in R. Just run archivr.R in its entirety and then run in chunks as I try to fix things?

  2. Where are the function descriptions for the automated documentation stored?

  3. Where's the main package documentation stored (currently that somewhat clumsily starts with the license)

  4. I can just update the Readme as I please, right?

@adam3smith adam3smith merged commit fd5c2d0 into master Feb 3, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment