-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How does Optimize decide the File Size (Question) #3272
Comments
ugurkalkavan
changed the title
How does Optimize decide the File Size
How does Optimize decide the File Size (Question)
Jun 14, 2024
Optimize targets 1GB for the target files Delta Optimize uses bin packing for compacting the files. In simple words
Referrence : https://delta.io/blog/2023-01-25-delta-lake-small-file-compaction-optimize/ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I used to use my own auto compaction method on a legacy system.
How it basically works is that it calculates the sum of file size for every hive partition, and consolidate the files in every partition.
Example:
for a partition, there are 1000 thousand files which are around 1 MB.
Sum is 1 GB and the method divides the sum to 128 MB and ceil it , which is 8 in our case. it makes repartition it to 8.
after compaction, new total size is much less than 1 GB, it might be 700 MB.
So ı needed to recursively run the function, till it reach the proper size. (generally two or three times.)
My question is that, How delta optimize deals with this issue ?
Thank you.
The text was updated successfully, but these errors were encountered: