-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NEON encode and check #56
Conversation
Currently only runs on `aarch64`, because `arm` NEON intrinsics are unstable: rust-lang/rust#111800
Unfortunately, the only Aarch64 device I have access to for benchmarking with is my phone (a Samsung Galaxy A73). My Raspberry Pi 3B seems to have died since I last used it years ago 😢 Here are the relevant benchmark results from running
You can view the full output from |
Thank you, is it possible to run the benchmark on the CI workflow? |
Sure, how would I do that? Would I need to add benchmarks to the |
Sure. |
Unfortunately it seems like GitHub doesn't have any Aarch64 runners available at this time, but they're aiming for them to be available by the end of the year. This means there's currently no way to run the CI on an Aarch64 runner, unless you want to set up self-hosted runners. |
Co-authored-by: Quake Wang <quake.wang@gmail.com>
Add implementations for
hex_encode
andhex_check
using ARM's NEON (aka AdvSIMD) SIMD instruction set. These implementations are based on the existing SSE4.2 ones - they're more or less direct translations.These implementations are only active on
aarch64
targets and not 32-bit ARM targets (armv7
etc), because NEON intrinsics on 32-bit ARM are unstable.Unfortunately, checking for NEON support at runtime is a difficult problem to solve. My current implementation is less than ideal:
https://github.com/Lynnesbian/faster-hex/blob/859221bbcfd2256047b5bf6d334f30beb906ee3f/src/lib.rs#L159-L171
I've found a variety of differing ways to get this information on Aarch64 platforms:
HWCAP
interface (getauxval()
), or reading/proc/cpuinfo
IsProcessorFeaturePresent
withPF_ARM_NEON_INSTRUCTIONS_AVAILABLE
elf_aux_info()
sysctlbyname
withmachdep.neon_present
sysctl
withCTL_MACHDEP
andCPU_ID_AA64PFR0
/proc/cpuinfo
, if enabled by the given BSD, will also workThere's no nice, cross-platform, no-
std
method to do this, like there is with x86'scpuid
. And worse - many of these methods only work for Aarch64, and not 32-bit ARM platforms.I decided against including all of these methods in the
vectorization_support
function. They'd necessitate bringing in multiple new dependencies, and would make testing much more complicated.