Add NEON encode and check #56

Lynnesbian · 2024-09-23T05:32:20Z

Add implementations for hex_encode and hex_check using ARM's NEON (aka AdvSIMD) SIMD instruction set. These implementations are based on the existing SSE4.2 ones - they're more or less direct translations.

These implementations are only active on aarch64 targets and not 32-bit ARM targets (armv7 etc), because NEON intrinsics on 32-bit ARM are unstable.

Unfortunately, checking for NEON support at runtime is a difficult problem to solve. My current implementation is less than ideal:

https://github.com/Lynnesbian/faster-hex/blob/859221bbcfd2256047b5bf6d334f30beb906ee3f/src/lib.rs#L159-L171

I've found a variety of differing ways to get this information on Aarch64 platforms:

Linux: The HWCAP interface (getauxval()), or reading /proc/cpuinfo
Android: Google's cpu_features library
Windows: Call IsProcessorFeaturePresent with PF_ARM_NEON_INSTRUCTIONS_AVAILABLE
BSDs:
- FreeBSD: elf_aux_info()
- NetBSD: sysctlbyname with machdep.neon_present
- OpenBSD: sysctl with CTL_MACHDEP and CPU_ID_AA64PFR0
- It's worth noting that /proc/cpuinfo, if enabled by the given BSD, will also work

There's no nice, cross-platform, no-std method to do this, like there is with x86's cpuid. And worse - many of these methods only work for Aarch64, and not 32-bit ARM platforms.

I decided against including all of these methods in the vectorization_support function. They'd necessitate bringing in multiple new dependencies, and would make testing much more complicated.

Currently only runs on `aarch64`, because `arm` NEON intrinsics are unstable: rust-lang/rust#111800

Lynnesbian · 2024-09-23T05:45:12Z

Unfortunately, the only Aarch64 device I have access to for benchmarking with is my phone (a Samsung Galaxy A73). My Raspberry Pi 3B seems to have died since I last used it years ago 😢

Here are the relevant benchmark results from running cargo bench under Termux (which is, of course, far from an ideal benchmarking environment):

Bench	Result
`bench_faster_hex_encode`	91.559 ns
`bench_faster_hex_encode_fallback`	140.00 ns

You can view the full output from cargo bench here.

eval-exec · 2024-09-23T05:52:57Z

Thank you, is it possible to run the benchmark on the CI workflow?

Lynnesbian · 2024-09-23T06:25:35Z

Sure, how would I do that? Would I need to add benchmarks to the rust.yml workflow file?

eval-exec · 2024-09-23T06:30:48Z

Sure, how would I do that? Would I need to add benchmarks to the rust.yml workflow file?

Sure.

Lynnesbian · 2024-09-23T06:39:00Z

Unfortunately it seems like GitHub doesn't have any Aarch64 runners available at this time, but they're aiming for them to be available by the end of the year.

This means there's currently no way to run the CI on an Aarch64 runner, unless you want to set up self-hosted runners.

src/lib.rs

Co-authored-by: Quake Wang <quake.wang@gmail.com>

Lynnesbian added 5 commits September 17, 2024 15:56

Add ARM NEON encode implementation

0feb5d5

Currently only runs on `aarch64`, because `arm` NEON intrinsics are unstable: rust-lang/rust#111800

Add ARM NEON check implementation

6d33a11

Fix no-std builds

b956608

Add missing use statements

859221b

Formatting

7f3266e

eval-exec requested review from zhangsoledad, quake and eval-exec September 23, 2024 05:53

Merge branch 'master' into neon

f5d04bd

quake reviewed Sep 24, 2024

View reviewed changes

src/lib.rs Show resolved Hide resolved

Lynnesbian and others added 3 commits September 24, 2024 10:56

Gate Vectorization::Neon to aarch64

6ab5054

Co-authored-by: Quake Wang <quake.wang@gmail.com>

Remove now-unnecessary unreachable statements

b9e8c92

Gate SSE41 and AVX2 vectorisation behind x86/64

b67c41f

quake approved these changes Sep 27, 2024

View reviewed changes

quake merged commit 4acf38e into nervosnetwork:master Sep 27, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NEON encode and check #56

Add NEON encode and check #56

Lynnesbian commented Sep 23, 2024 •

edited

Loading

Lynnesbian commented Sep 23, 2024

eval-exec commented Sep 23, 2024

Lynnesbian commented Sep 23, 2024

eval-exec commented Sep 23, 2024 •

edited

Loading

Lynnesbian commented Sep 23, 2024

Add NEON encode and check #56

Add NEON encode and check #56

Conversation

Lynnesbian commented Sep 23, 2024 • edited Loading

Lynnesbian commented Sep 23, 2024

eval-exec commented Sep 23, 2024

Lynnesbian commented Sep 23, 2024

eval-exec commented Sep 23, 2024 • edited Loading

Lynnesbian commented Sep 23, 2024

Lynnesbian commented Sep 23, 2024 •

edited

Loading

eval-exec commented Sep 23, 2024 •

edited

Loading