Support NEON instruction set #12

GCCFeli · 2016-09-23T03:10:08Z

It would be great if NEON is supported :)

guillaumeblanc · 2016-09-24T20:48:38Z

Yes it definitely would. I would have no ARM hardware to test the implementation though.

The process to port ozz SIMD implementation is:

In simd_math_config.h file:
- Add NEON detection based on __ARM_NEON preprocessor definition
- Include <arm_neon.h>
- Typedef SimdFloat4 and SimdInt4 with NEON types..
- Include simd_math_neon-inl.h which will contain neon implementation.
Add a new file in ozz/base/math/internal folder named simd_math_neon-inl.h.
- Start with a copy/paste from simd_math_sse-inl.h or simd_math_ref-inl.h.
- Port all functions from this file.
- Run unit tests.

The whole library, including SoA implementation, is based on the functions from simd_math_*-inl.h, so there's nothing else needed.

guillaumeblanc · 2016-09-28T20:19:32Z

I reopen the request as I think it makes a lot of sense to implement it indeed.

jazzbre · 2016-09-29T12:28:11Z

https://github.com/scoopr/vectorial or even this one https://github.com/jratcliff63367/sse2neon -> good reference for sse/neon implementation.

kylawl · 2019-08-01T19:37:05Z

We're going to be starting on Switch soon. Expect a PR early next year, but if someone wants to do it before us, that would be nice!

guillaumeblanc · 2019-08-06T19:47:01Z

Awesome news @kylawl. Don't hesitate to reach me if you want to discuss this or need help/support.

kylawl · 2021-05-26T01:25:11Z

So it's been a while and I'm back looking at this again. As a first step, I thought I'd just try using sse2neon to see if there's any benefit from simply aliasing all the instructions raw like that. Performance is actually surprisingly poor going this route on Switch. The sse reference implementation takes about 1.2ms for our whole animation phase while using sse2neon takes 2.7ms! Not exactly the sort of thing I was expecting/hoping for.

I've seen some discussion that we could be throttled due to memory access overhead rather than computation, going to need some more investigation.

ColinGilbert · 2021-05-27T04:52:24Z

If I remember correctly, Bullet physics had code contributed by Apple that made it very performant on ARM/iOS. Maybe that would be worth looking at?

guillaumeblanc · 2021-05-27T06:55:37Z

Welcome back!

You say 1.2ms for "sse reference implementation". Do you mean float/scalar reference implementation? If so, it could be worth checking the generated code, to see how much the compiler auto-vectorizes the code. All the SoA usages of the math library in ozz are very easy for the compiler to auto-vectorize, so maybe neon is already at use. That doesn't mean 1.2ms can not be optimized, but optimization expectations would be lower.

Are the memory access overhead issues you mentioned specific to neon?

kylawl · 2021-06-06T04:21:23Z

You're probably right that the autovectorization is doing a decent job. One thing that sse2neon misses is the common shuffle operations that we do to splat the same value into all 4 components. For that particular shuffle operation, they use a multi instruction "generic" path even though arm has a specific instruction for handling that operation. After spending some more time on switch optimizations, I don't think this is a memory access issue. Needs further investigation for sure.

guillaumeblanc · 2024-03-04T20:33:55Z

Hi,

what did you end up doing on Switch? Did you need/implement neon optimizations ?

Cheers,
Guillaume

kylawl · 2024-03-04T21:49:41Z

It's been a while, but if I remember correctly. The compiler was able to optimize the output sufficiently for us to use. We tried one of those sse to neon headers and it was significantly slower that just using the vanilla one. Baring in mind that our skeletons were only a small number of bones maybe averaging 30 bones on like max 5 characters at a time. On Switch we were cpu bound but the minimal animation time was outstripped by the "open worldness" of the game. Sorry we never got to completing that.

…

On Mon, Mar 4, 2024, 12:34 p.m. Guillaume Blanc ***@***.***> wrote: Hi, what did you end up doing on Switch? Did you need/implement neon optimizations ? Cheers, Guillaume — Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABFY7V5T6OVDSRUHT5QYRXDYWTLD7AVCNFSM4CQSJALKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJXG42DANZQGEZA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

guillaumeblanc · 2024-03-06T18:01:26Z

No worries, thanks for the feedback.
I think it's good to know that reference implementation provides good results as a cross-platform fallback.

GCCFeli closed this as completed Sep 28, 2016

guillaumeblanc added the enhancement label Sep 28, 2016

guillaumeblanc reopened this Sep 28, 2016

guillaumeblanc mentioned this issue Aug 1, 2019

Is SIMD support avaliable on ARM platform? #72

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support NEON instruction set #12

Support NEON instruction set #12

GCCFeli commented Sep 23, 2016

guillaumeblanc commented Sep 24, 2016

guillaumeblanc commented Sep 28, 2016

jazzbre commented Sep 29, 2016 •

edited

Loading

kylawl commented Aug 1, 2019

guillaumeblanc commented Aug 6, 2019

kylawl commented May 26, 2021

ColinGilbert commented May 27, 2021

guillaumeblanc commented May 27, 2021

kylawl commented Jun 6, 2021

guillaumeblanc commented Mar 4, 2024

kylawl commented Mar 4, 2024 via email

guillaumeblanc commented Mar 6, 2024

Support NEON instruction set #12

Support NEON instruction set #12

Comments

GCCFeli commented Sep 23, 2016

guillaumeblanc commented Sep 24, 2016

guillaumeblanc commented Sep 28, 2016

jazzbre commented Sep 29, 2016 • edited Loading

kylawl commented Aug 1, 2019

guillaumeblanc commented Aug 6, 2019

kylawl commented May 26, 2021

ColinGilbert commented May 27, 2021

guillaumeblanc commented May 27, 2021

kylawl commented Jun 6, 2021

guillaumeblanc commented Mar 4, 2024

kylawl commented Mar 4, 2024 via email

guillaumeblanc commented Mar 6, 2024

jazzbre commented Sep 29, 2016 •

edited

Loading