Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading multiple files #46

Open
adam3smith opened this issue Jan 10, 2020 · 4 comments
Open

Downloading multiple files #46

adam3smith opened this issue Jan 10, 2020 · 4 comments

Comments

@adam3smith
Copy link
Contributor

@adam3smith adam3smith commented Jan 10, 2020

Please specify whether your issue is about:

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

I think this is just a question, but might also be enhancement/bug report.: The dataverse API allows downloading multipel files as .zip. This is particularly relevant now as it preserves the folder structure where available.
There is code in the get_file() function that accesses this functionality, but I don't actually think it's ever possible to get there: I find no way of specifying multiple fileids

So first question:

  1. Am I right about this? Or could someone give me syntax to do this in get_file()?
  2. If I'm right that this isn't possible, what would be a good way to do this? Allow a vector of ids as input for the file parameter?
@pdurbin

This comment has been minimized.

Copy link
Member

@pdurbin pdurbin commented Jan 10, 2020

I don't mean to muddy the waters but there is a conversation going on about the :ZipDownloadLimit that comes into play here: https://groups.google.com/d/msg/dataverse-community/V1gExuDnm0A/nR4FIU1QBgAJ .Just something to be conscious of.

The Dataverse API absolutely does allow you to ask the Dataverse server to zip up a bunch of files by passing a comma-separated list of database IDs for files: http://guides.dataverse.org/en/4.18.1/api/dataaccess.html#multiple-file-bundle-download

You could also create the zip file client side, but this is more work (though easier on the server). You'd need to get the file hierarchy from the directoryLabel field in the metadata: https://dev2.dataverse.org/api/datasets/export?exporter=dataverse_json&persistentId=doi%3A10.5072/FK2/V8C0XO

@adam3smith

This comment has been minimized.

Copy link
Contributor Author

@adam3smith adam3smith commented Jan 10, 2020

Thanks @pdurbin -- yes, aware of the file zip limit discussion, but at least I'm using this with QDR where we have a more generous limit.

The Dataverse API absolutely does allow you to ask the Dataverse server to zip up a bunch of files by passing a comma-separated list of database IDs for files: http://guides.dataverse.org/en/4.18.1/api/dataaccess.html#multiple-file-bundle-download

Yes, that's what I was referring to and the linked code in get_file() actually implements that, it just never gets called (I think)

@pdurbin

This comment has been minimized.

Copy link
Member

@pdurbin pdurbin commented Jan 10, 2020

@adam3smith ah, I just clicked and I see what you mean:

    fileid <- paste0(fileid, collapse = ",")
    u <- paste0(api_url(server), "access/datafiles/", file)

Yes, that should do the trick, if it gets called. 😄

@adam3smith

This comment has been minimized.

Copy link
Contributor Author

@adam3smith adam3smith commented Jan 10, 2020

Ah got it -- this is possible in principle using a numeric vector (as one would expect), but there's a regression from 5ec375b that missed one of the file --> fileid

I'll submit a PR with added documentation, test, and fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.