AWS S3 - Possible optimalization - to many blobs #249

zdenek-jonas · 2024-07-11T08:36:30Z

During an S3 load test, I identified several areas that need improvement. Since all files in S3 are immutable, existing files cannot be edited; instead, a new file must be created. When handling scenarios involving continuous writing of small data, the number of files increases dramatically. The same issue applies to transaction logs. Additionally, the current implementation uses a counter as a suffix for these files, which are then recognized as different types in S3.

Detailed Description:

In the current setup, each small data write operation creates a new file. This leads to an exponential increase in the number of files over time. For example, in scenarios where a large number of small data pieces are written continuously, the system ends up creating a vast number of small files, which can be inefficient and difficult to manage.

Each file created has a counter implemented as a suffix, which causes S3 to recognize them as different types. This results in an apparent variety of file types, but fundamentally, there are only three types of files: data, transaction, and persistence dictionary types.

Similarly, the transaction logs also contribute to this issue, as they are stored as separate immutable files, further increasing the file count.

Proposed Solution:

To address this, it would be beneficial to implement a special configuration for S3 that consolidates these numerous small files into fewer, larger files. This would help prevent the creation of thousands of files within a single storage.

Attachments:

zdenek-jonas changed the title ~~AWS S3 - Possible optimalization~~ AWS S3 - Possible optimalization - to many blobs Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS S3 - Possible optimalization - to many blobs #249

AWS S3 - Possible optimalization - to many blobs #249

zdenek-jonas commented Jul 11, 2024 •

edited

Loading

AWS S3 - Possible optimalization - to many blobs #249

AWS S3 - Possible optimalization - to many blobs #249

Comments

zdenek-jonas commented Jul 11, 2024 • edited Loading

Detailed Description:

Proposed Solution:

zdenek-jonas commented Jul 11, 2024 •

edited

Loading