-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework decompression algorithm to operate over 4-byte chunks #45
base: develop
Are you sure you want to change the base?
Conversation
Hey! Thanks for opening, this is definitely something we want to add. Apologies for being slow here, will find some time in next few days to get you some feedback |
@NickCondron thanks for your patience here. The code looks fine, and it passes all tests, have not run it through the fuzzer yet but should do that as well. Here are my results on an AMD EPYC processor from a random Hetzner box:
And on my M2 Max Apple Silicon
I very quickly added some counters to when the escape_mask was 0 VS non-zero, and it seems like that is the biggest predictor of how the benchmarks were affected by this change
Hence, given that |
In the morning I'll try running Vortex's TPC-H benchmarks with this version of FSST and see if it has any effect |
I triggered a benchmark run in spiraldb/vortex#1158 👀 |
I tried implementing the chunk decompression approach (used in the original library) in Rust. This is very WIP, but I was curious what you thought about this idea. It's my first time writing Rust with this much unsafe, so I'm open to ideas to improve how it's organized.
I used the benchmark code in the my other PR. The performance results are mixed but promising. Maybe my code could be further optimized. Maybe it's worth testing other data sets.