Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Upgrade to twoxhash 2.1 and use oneshot API for improved performance #6878

Closed
wants to merge 1 commit into from

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Dec 13, 2024

Which issue does this PR close?

N/A

Rationale for this change

There is a new version of twoxhash that has a new oneshot API for use cases where all data is available at once. This is more performant than the streaming API.

Arrow uses this crate for hashing related to bloom filters.

We saw significant performance improvements in Comet with this approach, so hopefully it helps with arrow as well.

I tried benchmarking with write_batch benchmarks but am seeing highly variable results on my desktop, so I am not sure how to prove that this helps with performance.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the parquet Changes to the parquet crate label Dec 13, 2024
@andygrove andygrove marked this pull request as ready for review December 13, 2024 17:27
@andygrove andygrove requested a review from viirya December 13, 2024 17:33
Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSRV check failed.

@andygrove
Copy link
Member Author

twoxhash defines rust-version as 1.81.0 even though it builds OK with much older versions. I will move this to draft for now.

FYI @shepmaster is it possible to lower the minimum rust version for twoxhash?

@andygrove andygrove marked this pull request as draft December 13, 2024 19:32
@shepmaster
Copy link
Contributor

See also #6583

even though it builds OK with much older versions

I'm not sure I follow:

cargo +1.80 test --all-features

# ...

error: could not compile `twox-hash` (lib) due to 12 previous errors

@andygrove
Copy link
Member Author

Closed since #6583 already exists. Thank you @shepmaster

@andygrove andygrove closed this Dec 14, 2024
@andygrove andygrove deleted the twox-hash-upgrade branch December 14, 2024 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants