Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upRefine the organization of `get_file()`? #48
Comments
This comment has been minimized.
This comment has been minimized.
To be honest, I don't have a strong opinion. I'd suggest maintaining backward compatibility if you can. |
This comment has been minimized.
This comment has been minimized.
Yes, keeping get_file() for backward compatibility makes sense. On the other hand, I think keeping get_file* and get_files* together keeps things simple, especially if you agree with my re-write of the multi-file download to use sequential downloads of individual files.
I don't have a view about the tibble/data frame So my suggestion then would be to reduce your list to the existing Given my re-write o |
This comment has been minimized.
This comment has been minimized.
More thoughts after sleeping on it:
|
wibeasley commentedJan 11, 2020
@adam3smith, @kuriwaki, @pdurbin, and anyone else,
Should
get_file()
be refactored into multiple child functions? It seems like we're asking it to do a lot of things, includingdata.frame
ortibble
.I like all these capabilities, and want to run discuss organizational ideas with people so the package structure is (a) easy for us to develop, test, & maintain, and (b) useful and natural to users to learn and incorporate.
One possible approach:
A foundational functional retrieves the file(s) by ID; it is the workhorse that actually retrieves the file. A second function accepts the file name (not ID); it essentially wraps the first function after calling
get_fileid()
. Both of these functions deal with a single file at a time.Another pair of functions deal with multiple files (one by name, one by id). But these return lists, not a single object. They're essentially lapplys/loops around their respective siblings described above.
To avoid breaking the package interface, maybe the existing
get_file()
keeps its same interface (that ambiguously accepts with file names or id and returns either single files or a list of files), but we soft-deprecate it and encourage new code to use these more explicit functions? The guts of the function is moved out into the four new functionsMaybe the function names are
get_file()
with an unchanged interfaceget_file_by_id()
(the workhorse)get_file_by_name()
get_files_by_id()
get_files_by_name()
get_tibble_by_id()
get_tibble_by_name()
get_zip_by_id()
get_zip_by_name()
I'm pretty sure it would be easier to write tests that isolate problems. The documentation becomes more verbose, but probably more straight-forward.
You guys have more experience with Dataverse than I do, and better sense of the use cases. Would this reorganization help users? If not, maybe we still split it into multiple functions, but just keep the visibility of functions 2-5 private.
Maybe I'm making this unnecessarily tedious, but I'm thinking that these download functions are the most called by R users, and they're certainly the ones that are called by new users. So if they leave a bad impression, the package is less likely to be used.