Multipart Uploads #80

russellpierce · Nov 19, 2016

It wasn't immediately obvious to me how you'd perform multipart uploads using aws.s3. The Java SDK has a high-level API for this sort of thing, perhaps aws.s3 should as well. Ideally, one would put and the package would select for itself if multi-part uploads were called for.

I've written methods for doing this using python and boto. Is the aim for aws.s3 to only use the web API or would a PR with this feature that used rPython or rJava be considered?

leeper · Nov 20, 2016

You can currently rig this together using put_object() calls by passing the relevant query args. I'd gladly take a PR for an R-only implementation that simplifies this; I don't want to add any system requirements, though.

mbjoseph · Apr 7, 2017

I'm just running across this with the same problem. I've taken a look at the API docs for multipart uploads and may be able to hack something together, but it would be awesome if there were a quick example that @leeper could share here (time permitting), and maybe even an example in the help page for put_object() would be nice.

leeper · Apr 22, 2017

This should be straightforward. Basically, Amazon recommends multipart uploads if file size is larger than 100MB. So, conditional on that, put_object() will either upload an object directly or, alternatively, initialize and complete a multipart upload where:

object is split into chunks
initialize the upload: http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadInitiate.html
loop over put_object() to upload parts: http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html, adding relevant headers
complete the multipart upload: http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadComplete.html (requires a new post_object() function)

Separately, this requires an abort upload procedure: http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadAbort.html, which will need to be a separate function.

martenlindblad · Aug 23, 2017

When I try multipart = TRUE I get a HTTP 403 error:

  $ Code                 : chr "SignatureDoesNotMatch"
 $ Message              : chr "The request signature we calculated does not match the signature you provided. Check your key and signing method."

If I have multipart = FALSE it works as usual.
I use the same env-variables and I am in the same R session.

amoeba · Feb 8, 2018

I'm also getting a signature match error. While looking at the source, I saw this call to readBin:

aws.s3/R/put_object.R

Line 75 in 72a632c

file <- readBin(file, what = "raw")

My understanding of readBin is that it only reads the first byte when it's called without an n argument.

> readLines("message.txt")
[1] "abcd"
> rawToChar(readBin("message.txt", raw()))
[1] "a"
> rawToChar(readBin("message.txt", raw(), file.size("message.txt")))
[1] "abcd\n"

My PR attempts to fix this. Thoughts @leeper? After that change, I'm now getting another error, which I haven't debugged yet but it makes sense given the traceback:

10: stop(txt, obj, call. = FALSE)
9: .errorhandler(paste("Argument object must be of type character", 
       "or raw vector if serialize is FALSE"), mode = errormode)
8: digest::digest(request_body, algo = "sha256", serialize = FALSE)
7: tolower(digest::digest(request_body, algo = "sha256", serialize = FALSE))
6: canonical_request(verb = verb, canonical_uri = action, query_args = query_args, 
       canonical_headers = canonical_headers, request_body = request_body)
5: aws.signature::signature_v4_auth(datetime = d_timestamp, region = region, 
       service = "s3", verb = verb, action = action, query_args = query, 
       canonical_headers = canonical_headers, request_body = request_body, 
       key = key, secret = secret, session_token = session_token) at s3HTTP.R#101
4: s3HTTP(verb = "POST", bucket = bucket, path = paste0("/", object), 
       headers = c(headers, list(`Content-Length` = ifelse(is.character(file) && 
           file.exists(file), file.size(file), length(file)))), 
       request_body = file, ...) at put_object.R#156
3: post_object(file = NULL, object = object, bucket = bucket, query = list(uploads = ""), 
       headers = headers, ...) at put_object.R#101
2: aws.s3::put_object(file, object, space, check_region = FALSE, 
       key = spaces_key, secret = spaces_secret, base_url = spaces_base, 
       ...) at spaces_object_put.R#44
1: spaces_object_put("~/large-file.dmg", space = "test-analogsea", 
       multipart = TRUE)

I can look into that some other time.

leeper · Jul 26, 2018

This should now be fixed in the version on GitHub.

amoeba · Jul 26, 2018

Thanks @leeper!

himanshu-sikaria · Dec 8, 2018

Can you add this to CRAN version as well?

leeper · Dec 8, 2018

The new maintainer of this package is @acolum. Please check with them about release schedule.

leeper added the enhancement label Nov 20, 2016

leeper referenced this issue Jan 7, 2017
Closed
memCompress Error on s3saveRDS #89

leeper added a commit that referenced this issue Apr 23, 2017

mockup multipart upload functionality (#80)

Loading status checks…

2fd92dd

leeper modified the milestone: First stable release to CRAN Apr 27, 2017

amoeba added a commit to amoeba/aws.s3 that referenced this issue Feb 8, 2018

Fix bug in readBin call w/in put_object
readBin only reads one byte if you don't set the n argument. It reads the whole file if you set n to the file size or higher. Related to cloudyr#80

e4d07b1

leeper referenced this issue Jul 26, 2018
Closed
put_object "SignatureDoesNotMatch" Error #241

leeper added the bug label Jul 26, 2018

leeper closed this in d16d413 Jul 26, 2018

cloudyr/aws.s3

Multipart Uploads #80

Multipart Uploads #80

russellpierce commented Nov 19, 2016

leeper added the enhancement label Nov 20, 2016

This comment has been minimized.

leeper commented Nov 20, 2016

leeper referenced this issue Jan 7, 2017

memCompress Error on s3saveRDS #89

This comment has been minimized.

mbjoseph commented Apr 7, 2017

This comment has been minimized.

leeper commented Apr 22, 2017

leeper added a commit that referenced this issue Apr 23, 2017

leeper modified the milestone: First stable release to CRAN Apr 27, 2017

This comment has been minimized.

martenlindblad commented Aug 23, 2017

amoeba added a commit to amoeba/aws.s3 that referenced this issue Feb 8, 2018

This comment has been minimized.

amoeba commented Feb 8, 2018

leeper referenced this issue Jul 26, 2018

put_object "SignatureDoesNotMatch" Error #241

leeper added the bug label Jul 26, 2018

This comment has been minimized.

leeper commented Jul 26, 2018

leeper closed this in `d16d413` Jul 26, 2018

This comment has been minimized.

amoeba commented Jul 26, 2018

This comment has been minimized.

himanshu-sikaria commented Dec 8, 2018

This comment has been minimized.

leeper commented Dec 8, 2018

cloudyr/aws.s3

Join GitHub today

Multipart Uploads #80

Comments

russellpierce commented Nov 19, 2016

leeper added the enhancement label Nov 20, 2016

This comment has been minimized.

leeper commented Nov 20, 2016

leeper referenced this issue Jan 7, 2017

memCompress Error on s3saveRDS #89

This comment has been minimized.

mbjoseph commented Apr 7, 2017

This comment has been minimized.

leeper commented Apr 22, 2017

leeper added a commit that referenced this issue Apr 23, 2017

leeper modified the milestone: First stable release to CRAN Apr 27, 2017

This comment has been minimized.

martenlindblad commented Aug 23, 2017

amoeba added a commit to amoeba/aws.s3 that referenced this issue Feb 8, 2018

This comment has been minimized.

amoeba commented Feb 8, 2018

leeper referenced this issue Jul 26, 2018

put_object "SignatureDoesNotMatch" Error #241

leeper added the bug label Jul 26, 2018

This comment has been minimized.

leeper commented Jul 26, 2018

leeper closed this in d16d413 Jul 26, 2018

This comment has been minimized.

amoeba commented Jul 26, 2018

This comment has been minimized.

himanshu-sikaria commented Dec 8, 2018

This comment has been minimized.

leeper commented Dec 8, 2018

leeper closed this in `d16d413` Jul 26, 2018