You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During an S3 load test, I identified several areas that need improvement. Since all files in S3 are immutable, existing files cannot be edited; instead, a new file must be created. When handling scenarios involving continuous writing of small data, the number of files increases dramatically. The same issue applies to transaction logs. Additionally, the current implementation uses a counter as a suffix for these files, which are then recognized as different types in S3.
Detailed Description:
In the current setup, each small data write operation creates a new file. This leads to an exponential increase in the number of files over time. For example, in scenarios where a large number of small data pieces are written continuously, the system ends up creating a vast number of small files, which can be inefficient and difficult to manage.
Each file created has a counter implemented as a suffix, which causes S3 to recognize them as different types. This results in an apparent variety of file types, but fundamentally, there are only three types of files: data, transaction, and persistence dictionary types.
Similarly, the transaction logs also contribute to this issue, as they are stored as separate immutable files, further increasing the file count.
Proposed Solution:
To address this, it would be beneficial to implement a special configuration for S3 that consolidates these numerous small files into fewer, larger files. This would help prevent the creation of thousands of files within a single storage.
Attachments:
The text was updated successfully, but these errors were encountered:
zdenek-jonas
changed the title
AWS S3 - Possible optimalization
AWS S3 - Possible optimalization - to many blobs
Sep 3, 2024
During an S3 load test, I identified several areas that need improvement. Since all files in S3 are immutable, existing files cannot be edited; instead, a new file must be created. When handling scenarios involving continuous writing of small data, the number of files increases dramatically. The same issue applies to transaction logs. Additionally, the current implementation uses a counter as a suffix for these files, which are then recognized as different types in S3.
Detailed Description:
In the current setup, each small data write operation creates a new file. This leads to an exponential increase in the number of files over time. For example, in scenarios where a large number of small data pieces are written continuously, the system ends up creating a vast number of small files, which can be inefficient and difficult to manage.
Each file created has a counter implemented as a suffix, which causes S3 to recognize them as different types. This results in an apparent variety of file types, but fundamentally, there are only three types of files: data, transaction, and persistence dictionary types.
Similarly, the transaction logs also contribute to this issue, as they are stored as separate immutable files, further increasing the file count.
Proposed Solution:
To address this, it would be beneficial to implement a special configuration for S3 that consolidates these numerous small files into fewer, larger files. This would help prevent the creation of thousands of files within a single storage.
Attachments:
The text was updated successfully, but these errors were encountered: