New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipart Uploads #80

Closed
russellpierce opened this Issue Nov 19, 2016 · 9 comments

Comments

Projects
None yet
6 participants
@russellpierce

russellpierce commented Nov 19, 2016

It wasn't immediately obvious to me how you'd perform multipart uploads using aws.s3. The Java SDK has a high-level API for this sort of thing, perhaps aws.s3 should as well. Ideally, one would put and the package would select for itself if multi-part uploads were called for.

I've written methods for doing this using python and boto. Is the aim for aws.s3 to only use the web API or would a PR with this feature that used rPython or rJava be considered?

@leeper leeper added the enhancement label Nov 20, 2016

@leeper

This comment has been minimized.

Member

leeper commented Nov 20, 2016

You can currently rig this together using put_object() calls by passing the relevant query args. I'd gladly take a PR for an R-only implementation that simplifies this; I don't want to add any system requirements, though.

@mbjoseph

This comment has been minimized.

mbjoseph commented Apr 7, 2017

I'm just running across this with the same problem. I've taken a look at the API docs for multipart uploads and may be able to hack something together, but it would be awesome if there were a quick example that @leeper could share here (time permitting), and maybe even an example in the help page for put_object() would be nice.

@leeper

This comment has been minimized.

Member

leeper commented Apr 22, 2017

This should be straightforward. Basically, Amazon recommends multipart uploads if file size is larger than 100MB. So, conditional on that, put_object() will either upload an object directly or, alternatively, initialize and complete a multipart upload where:

  1. object is split into chunks
  2. initialize the upload: http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadInitiate.html
  3. loop over put_object() to upload parts: http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html, adding relevant headers
  4. complete the multipart upload: http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadComplete.html (requires a new post_object() function)

Separately, this requires an abort upload procedure: http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadAbort.html, which will need to be a separate function.

leeper added a commit that referenced this issue Apr 23, 2017

@martenlindblad

This comment has been minimized.

martenlindblad commented Aug 23, 2017

When I try multipart = TRUE I get a HTTP 403 error:

  $ Code                 : chr "SignatureDoesNotMatch"
 $ Message              : chr "The request signature we calculated does not match the signature you provided. Check your key and signing method."

If I have multipart = FALSE it works as usual.
I use the same env-variables and I am in the same R session.

amoeba added a commit to amoeba/aws.s3 that referenced this issue Feb 8, 2018

Fix bug in readBin call w/in put_object
readBin only reads one byte if you don't set the n argument. It reads the whole file if you set n to the file size or higher.

Related to cloudyr#80
@amoeba

This comment has been minimized.

amoeba commented Feb 8, 2018

I'm also getting a signature match error. While looking at the source, I saw this call to readBin:

file <- readBin(file, what = "raw")

My understanding of readBin is that it only reads the first byte when it's called without an n argument.

> readLines("message.txt")
[1] "abcd"
> rawToChar(readBin("message.txt", raw()))
[1] "a"
> rawToChar(readBin("message.txt", raw(), file.size("message.txt")))
[1] "abcd\n"

My PR attempts to fix this. Thoughts @leeper? After that change, I'm now getting another error, which I haven't debugged yet but it makes sense given the traceback:

10: stop(txt, obj, call. = FALSE)
9: .errorhandler(paste("Argument object must be of type character", 
       "or raw vector if serialize is FALSE"), mode = errormode)
8: digest::digest(request_body, algo = "sha256", serialize = FALSE)
7: tolower(digest::digest(request_body, algo = "sha256", serialize = FALSE))
6: canonical_request(verb = verb, canonical_uri = action, query_args = query_args, 
       canonical_headers = canonical_headers, request_body = request_body)
5: aws.signature::signature_v4_auth(datetime = d_timestamp, region = region, 
       service = "s3", verb = verb, action = action, query_args = query, 
       canonical_headers = canonical_headers, request_body = request_body, 
       key = key, secret = secret, session_token = session_token) at s3HTTP.R#101
4: s3HTTP(verb = "POST", bucket = bucket, path = paste0("/", object), 
       headers = c(headers, list(`Content-Length` = ifelse(is.character(file) && 
           file.exists(file), file.size(file), length(file)))), 
       request_body = file, ...) at put_object.R#156
3: post_object(file = NULL, object = object, bucket = bucket, query = list(uploads = ""), 
       headers = headers, ...) at put_object.R#101
2: aws.s3::put_object(file, object, space, check_region = FALSE, 
       key = spaces_key, secret = spaces_secret, base_url = spaces_base, 
       ...) at spaces_object_put.R#44
1: spaces_object_put("~/large-file.dmg", space = "test-analogsea", 
       multipart = TRUE)

I can look into that some other time.

@leeper

This comment has been minimized.

Member

leeper commented Jul 26, 2018

This should now be fixed in the version on GitHub.

@leeper leeper closed this in d16d413 Jul 26, 2018

@amoeba

This comment has been minimized.

amoeba commented Jul 26, 2018

Thanks @leeper!

@himanshu-sikaria

This comment has been minimized.

himanshu-sikaria commented Dec 8, 2018

Can you add this to CRAN version as well?

@leeper

This comment has been minimized.

Member

leeper commented Dec 8, 2018

The new maintainer of this package is @acolum. Please check with them about release schedule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment