-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AArch64 extension handling and SVE/SVE2 feature detection #443
Add AArch64 extension handling and SVE/SVE2 feature detection #443
Conversation
Are you planing to contribute some SVE/SVE2 code? For now it seems a bit pointless. More so, because with the current build architecture its impossible to compile our x86 SIMD to SVE(2) using SIMDe. This makes the code basically placeholders, and we don't internally plan to use SVE or SVE2 anytime soon. |
That's the plan yes, in particular a lot of the convolutions and SAD calculations can make use of the SVE-only 16-bit dot-product instructions: SDOT and UDOT. I was aiming to put the first SVE patch and associated CMake changes up once there is a bit more Neon code to build on top of but figured the feature detection work could be a separate PR in the meantime. Let me know if you'd prefer me to combine this PR into a later one with the first SVE code instead. |
2b68f87
to
d6b8f70
Compare
I don't quite know what to do about it. Maybe thats also because of the lack of understanding for the architectures of ARM. So with x86 we have AVX2 and SSE4.1, with preference for AVX2 for CPUs supporting it, because its faster. What about SVE, SVE2 and NEON? Would SVE be preferrable to NEON? Do all architectures supporting SVE also support NEON? What about SVE2? Anyway, I'm thinking maybe the best way forward would be to keep the refactoring, and than put the SVE and SVE2 stuff in a macro that would for now not be enabled. So that its prepared but noone is distracted by SVE or SVE2 popping up in the |
So for the 64-bit Arm architecture (aka AArch64 or Armv8-A) we have Neon (previously also called Advanced SIMD or ASIMD) mandatory from the start (v8.0). The Scalable Vector Extension (SVE) is introduced and is an optional extension from v8.2. SVE2 was then introduced as part of v9.0 (v9.0 is a strict superset of v8.5). There are some good guides on and introductions to SVE available, for example Introduction to SVE. As a brief introduction, SVE provides:
It's worth emphasising that even when the SVE vector length is 128-bits (the same as Neon) we still expect a small improvement over Neon due to the additional new instructions.
Neon remains mandatory and available to use even when SVE/SVE2 are available. This is similar to how the presence of AVX512 does not break existing SSE2 code etc. The idea would be to contribute SVE/SVE2 code only where it provides a performance improvement to do so, such that SVE kernels should always be preferred when it is possible to use them.
Options I can think of:
I don't have a strong preference between those options, I'll be happy as long as we end up with SVE/SVE2 being enabled by default once we have some kernels that can actually use it. Having a CMake option to control compilation of newer architecture extensions might actually be useful for debugging, I was originally just trying to avoid doing too much in this PR. Let me know your preference or if any other clarifications are needed and I'll adjust the PR as needed. Thanks! |
Thanks for the clarification. First things first, I'd say lets go with 3. So basically introduce macro, similar to This confuses me as well tho, since you mentioned in #431 the kernels would not share much code between SVE and NEON. And now it sounds like SVE to NEON is like SSE4.2 to SSE3.0, basically just some additional instructions within the same framework. Or is the intrinsics syntax fully divergent between SVE and NEON? |
Ack!
The Neon vector registers Since the register sizes differ there are a different set of intrinsics to use here. For example the prototypes to add two vectors of Neon: To your point about code reuse between Neon and SVE: yes there are some cases where it will be beneficial to reuse the bulk of the existing Neon code and simply use the new SVE instructions by operating on the lowest 128-bits of the SVE registers, taking advantage of how the registers overlap. For these cases we would probably re-introduce headers as we need them to expose some common helper functions to avoid the duplication. |
Understand, thanks! Makes me think we could structure the x86 intrinsics better as well. But well, problem for another day. Lets get this MR merged first. |
d6b8f70
to
c9cce7e
Compare
Added new CMake flags
|
This commit continues to assume that only Neon is present, but adds the helper functions and call sites to match the existing x86 behaviour in preparation for adding feature detection logic in a later commit.
Plus wire up Linux feature detection and amend init switches to just fall back to the Neon cases for now.
Introduce new options VVENC_ENABLE_ARM_SIMD_SVE and VVENC_ENABLE_ARM_SIMD_SVE2 to control whether SVE and SVE2 are enabled, plus add #if guards to disable feature detection if the feature is not available. This commit does not include guarding which source files are actually built with SVE/SVE2 flags enabled since there are currently zero SVE/SVE2 source files.
c9cce7e
to
45e6380
Compare
Refactor and add extensions for AArch64:
Add new helper functions in
CommonDefARM.cpp
and adjust call sites to mirror the existing x86 behaviour.Amend the existing function names for the x86 extension handling to include "x86" in the name to distinguish from the new Arm cases.
Add new Arm extension enum (
ARM_VEXT
) values for the Arm Scalable Vector Extension (SVE) and SVE2 extensions.Add Linux
getauxval
-based feature detection logic for the two new architecture features.Amend the
InitARM.cpp
switch statements to continue to fall back to the Neon implementations for now.