Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add simd string quote seeking #52

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

add simd string quote seeking #52

wants to merge 5 commits into from

Conversation

davidhewitt
Copy link
Collaborator

No description provided.

@davidhewitt davidhewitt force-pushed the dh/simd-string branch 2 times, most recently from ca4cf3e to 7123c36 Compare December 5, 2023 20:14
Copy link

codecov bot commented Dec 5, 2023

Codecov Report

Merging #52 (84087c9) into main (b6a645c) will decrease coverage by 4.73%.
Report is 1 commits behind head on main.
The diff coverage is 62.72%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
- Coverage   95.12%   90.40%   -4.73%     
==========================================
  Files           8        8              
  Lines        1067     1198     +131     
==========================================
+ Hits         1015     1083      +68     
- Misses         52      115      +63     
Files Coverage Δ
src/string_decoder.rs 71.20% <62.72%> (-21.24%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 71df64a...84087c9. Read the comment docs.

Copy link

codspeed-hq bot commented Dec 5, 2023

CodSpeed Performance Report

Merging #52 will degrade performances by 10.04%

Comparing dh/simd-string (84087c9) with main (71df64a)

Summary

❌ 1 regressions
✅ 41 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main dh/simd-string Change
string_array_jiter_iter 44.9 µs 49.9 µs -10.04%

@davidhewitt
Copy link
Collaborator Author

davidhewitt commented Dec 5, 2023

I could reproduce locally that SSE simd was slower than whatever the compiler was generating. On aarch64 the simd intrinsics are much newer and I did reproduce a speedup of some 25%; this may also be that the aarch64-apple-darwin target is not as optimised as the x86_64-linux-gnu one which makes it easier to beat the compiler.

We could look at AVX simd but that can't run under Rosetta on macOS, so I'll have to look next week. It's unclear if GitHub Actions supports AVX.

@davidhewitt
Copy link
Collaborator Author

I've implemented AVX simd for x86_64.

It's not clear-cut; for short strings the non-simd version wins. On aarch64 (neon) on my M1 the breakeven is two-character strings, for x86_64 on my desktop the breakeven is four-character strings.

The address sanitizer doesn't like the buffer overflow. This isn't a huge surprise, in practice it shouldn't be a problem but it is a potential source of danger after refactoring, so it should probably be reworked.

Overall what I've got here is a bit of a mess. Given it's not a guaranteed perf win, especially in pydantic where we might expect a lot of short strings for enum values (?), I'm not particularly in love with this branch. I'll leave it here as draft, to be decided later if we refactor or bin it.

@davidhewitt davidhewitt marked this pull request as draft December 12, 2023 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant