Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS S3 - Possible optimalization - to many blobs #249

Open
zdenek-jonas opened this issue Jul 11, 2024 · 0 comments
Open

AWS S3 - Possible optimalization - to many blobs #249

zdenek-jonas opened this issue Jul 11, 2024 · 0 comments

Comments

@zdenek-jonas
Copy link
Contributor

zdenek-jonas commented Jul 11, 2024

During an S3 load test, I identified several areas that need improvement. Since all files in S3 are immutable, existing files cannot be edited; instead, a new file must be created. When handling scenarios involving continuous writing of small data, the number of files increases dramatically. The same issue applies to transaction logs. Additionally, the current implementation uses a counter as a suffix for these files, which are then recognized as different types in S3.

Detailed Description:

In the current setup, each small data write operation creates a new file. This leads to an exponential increase in the number of files over time. For example, in scenarios where a large number of small data pieces are written continuously, the system ends up creating a vast number of small files, which can be inefficient and difficult to manage.

Each file created has a counter implemented as a suffix, which causes S3 to recognize them as different types. This results in an apparent variety of file types, but fundamentally, there are only three types of files: data, transaction, and persistence dictionary types.

Similarly, the transaction logs also contribute to this issue, as they are stored as separate immutable files, further increasing the file count.

Proposed Solution:

To address this, it would be beneficial to implement a special configuration for S3 that consolidates these numerous small files into fewer, larger files. This would help prevent the creation of thousands of files within a single storage.

Attachments:
image

@zdenek-jonas zdenek-jonas changed the title AWS S3 - Possible optimalization AWS S3 - Possible optimalization - to many blobs Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant