object_store: Support for streaming body in `put/put_opts` #6934

thinkharderdev · 2025-01-03T12:20:06Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently, put/put_opts takes a PutPayload which is just a wrapper around an array of Bytes. It would be useful (for us) to be able to stream the body without having to do a multipart upload.

Describe the solution you'd like

The underlying http library reqwest already supports this. So I think we can change the definition of PutPayload to an enum like so:

type StreamingBody = Box<dyn futures::stream::Stream<Item = Result<Bytes, Box<dyn std::error::Error + Send + Sync + 'static>>> + Send + 'static>;

pub enum PutPayload {
  Bytes(Arc<[Bytes]>),
  Streaming(StreamingBody,usize),
}

where PutPayload::Streaming holds the stream and the total content length (which must be known ahead of time).

The (unfortunate) requirement of the S3 API to know the content-length up front means this isn't fully general but can be useful in certain situations such as buffering a largeish file on disk before uploading. In that case, a streaming body would allow the upload to not have to load the entire object in memory to upload it.

Describe alternatives you've considered

This can already be done with put_multipart but mutli-part upload is complicated, may have some performance implications for reads.

Additional context

The text was updated successfully, but these errors were encountered:

tustvold · 2025-01-03T13:26:19Z

This can already be done with put_multipart but mutli-part upload is complicated, may have some performance implications for reads.

Could you expand on this a bit, the multi-part upload is typically more performant for large uploads (>10MB). Not only can it upload chunks in parallel, it has much better reliability as it can retry just failed parts instead of the entire upload.

I have debated adding this in the past, but the protocol for supporting it is complicated, and it is unclear to me that it would yield any tangible benefits.

To be honest I had viewed the streaming upload feature as largely having been superseded by the multipart upload feature. This is certainly the way it was marketed and AWS tools like S3 Transfer Manager recommend using multipart uploads to improve performance.

S3 Transfer Manager can take advantage of performance improvements such as the multipart upload API and byte-range fetches

Edit: FYI the BufWriter::put API should take care of most of the complexities of doing streaming uploads.

thinkharderdev · 2025-01-03T13:59:54Z

Could you expand on this a bit, the multi-part upload is typically more performant for large uploads (>10MB). Not only can it upload chunks in parallel, it has much better reliability as it can retry just failed parts instead of the entire upload.

My understanding (which may be wrong) is that read latency can be higher when doing range fetches for multi-part files that cross part boundaries since S3 stores the parts separately.

On a separate note, I think there are some concurrency issues in BufWriter which I am investigating. We have seen issues where the parts are in the wrong order when uploaded (and it stops happening when we set the max concurrency in BufWriter to 1).

tustvold · 2025-01-03T14:09:39Z

My understanding (which may be wrong) is that read latency can be higher when doing range fetches for multi-part files that cross part boundaries since S3 stores the parts separately.

I've not seen any AWS material documenting this as a thing, and even if it were, S3 latencies are so noisy I suspect it would be rendered largely irrelevant.

As for the concurrency issues I suggest reading through #6460 filed by your coworker, from his description it sounded like a bug in your code.

thinkharderdev · 2025-01-03T14:25:31Z

As for the concurrency issues I suggest reading through #6460 filed by your coworker, from his description it sounded like a bug in your code.

I've read through that one and this is actually a different issue. Nothing fails but the files are corrupted because the parts are in the wrong order. This happens consistently (as in it happens to a certain percentage of uploads consistently, no one case I've found is actually reproducible). I'm trying to find a way to reproduce it and will create an issue if I can find something.

tustvold · 2025-01-03T14:37:11Z

Nothing fails but the files are corrupted because the parts are in the wrong order.

That sounds to me like exactly the same issue, I'm not familiar with how you are integrating the multipart upload, but my suspicion is whatever is sinking to it is accidentally deferring the put_part to a spawned future, or something similar.

Correct

tokio::spawn(upload.put_part(x));

Incorrect

tokio::spawn(async move {
    upload.lock().put_part(x);
})

I'd be very suspicious of anything that is using RWLock or similar

thinkharderdev · 2025-01-03T15:00:30Z

Nothing fails but the files are corrupted because the parts are in the wrong order.

That sounds to me like exactly the same issue, I'm not familiar with how you are integrating the multipart upload, but my suspicion is whatever is sinking to it is accidentally deferring the put_part to a spawned future, or something similar.

Correct
tokio::spawn(upload.put_part(x));
Incorrect
tokio::spawn(async move {
    upload.lock().put_part(x);
})
I'd be very suspicious of anything that is using RWLock or similar

There is nothing like that. I can't share the code here because it is internal, but I can assure you there is nothing weird going on. No spawned tasks, no interior mutability just a plain vanilla usage of BufWriter as an AsyncWrite. Conceptually it's just

Edit: more accurate version

let mut writer = BufWriter::new(store, path);

let mut buf = vec![];
while something {
   buf.extend_from_slice(&some_data);

   if buf.len() > SOME_SIZE {
      writer.write_all(&buf).await?
      buf.clear();
   }
}

writer.flush().await?;
writer.shutdown().await?;

I didn't want to create an issue until I have something concrete (and am double-sure it's not a bug on our end even though I am quite confident it is not) so I will report something when/if I have it.

Xuanwo · 2025-01-03T15:40:35Z

Edit: more accurate version

let mut writer = BufWriter::new(store, path);

let mut buf = vec![];
while something {
   buf.extend_from_slice(&some_data);

   if buf.len() > SOME_SIZE {
      writer.write_all(&buf).await?
      buf.clear();
   }
}

writer.flush().await?;
writer.shutdown().await?;

Hi, if the code here is correct, it's possible that the buf still have content after the while loop finished.

thinkharderdev · 2025-01-03T15:42:22Z

Hi, if the code here is correct, it's possible that the buf still have content after the while loop finished.

That code is simplifies for demonstration, but the real code will write the remaining data. The problem we are experiencing though is not that the files are too short but that they are jumbled because the parts are in the wrong order. The total file size is correct.

thinkharderdev added the enhancement Any new improvement worthy of a entry in the changelog label Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

object_store: Support for streaming body in `put/put_opts` #6934

object_store: Support for streaming body in `put/put_opts` #6934

thinkharderdev commented Jan 3, 2025

tustvold commented Jan 3, 2025 •

edited

Loading

thinkharderdev commented Jan 3, 2025

tustvold commented Jan 3, 2025

thinkharderdev commented Jan 3, 2025

tustvold commented Jan 3, 2025 •

edited

Loading

thinkharderdev commented Jan 3, 2025 •

edited

Loading

Xuanwo commented Jan 3, 2025

thinkharderdev commented Jan 3, 2025

object_store: Support for streaming body in put/put_opts #6934

object_store: Support for streaming body in put/put_opts #6934

Comments

thinkharderdev commented Jan 3, 2025

tustvold commented Jan 3, 2025 • edited Loading

thinkharderdev commented Jan 3, 2025

tustvold commented Jan 3, 2025

thinkharderdev commented Jan 3, 2025

tustvold commented Jan 3, 2025 • edited Loading

thinkharderdev commented Jan 3, 2025 • edited Loading

Xuanwo commented Jan 3, 2025

thinkharderdev commented Jan 3, 2025

object_store: Support for streaming body in `put/put_opts` #6934

object_store: Support for streaming body in `put/put_opts` #6934

tustvold commented Jan 3, 2025 •

edited

Loading

tustvold commented Jan 3, 2025 •

edited

Loading

thinkharderdev commented Jan 3, 2025 •

edited

Loading