-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SHA-3 Thumb2, ARM32 ASM: Add assembly implemention #7667
Conversation
891e3e6
to
3f06a50
Compare
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STM32H7A Cortex-M7 @ 240MHz:
Could not build with WOLFSSL_ARMASM_INLINE
because BlockSha3
is defined as static and called from sha3.c.
This is with "-Os".
SHA-3 Before:
SHA3-224 1 MiB took 1.020 seconds, 1.125 MiB/s
SHA3-256 1 MiB took 1.008 seconds, 1.066 MiB/s
SHA3-384 850 KiB took 1.007 seconds, 844.091 KiB/s
SHA3-512 600 KiB took 1.016 seconds, 590.551 KiB/s
SHAKE128 1 MiB took 1.012 seconds, 1.303 MiB/s
SHAKE256 1 MiB took 1.008 seconds, 1.066 MiB/s
SHA-3 With PR 7667:
SHA3-224 1 MiB took 1.012 seconds, 1.254 MiB/s
SHA3-256 1 MiB took 1.008 seconds, 1.187 MiB/s
SHA3-384 950 KiB took 1.008 seconds, 942.460 KiB/s
SHA3-512 675 KiB took 1.024 seconds, 659.180 KiB/s
SHAKE128 1 MiB took 1.012 seconds, 1.447 MiB/s
SHAKE256 1 MiB took 1.008 seconds, 1.187 MiB/s
For the Dilithium speed it improved to:
ML-DSA 44 key gen 56 ops took 1.028 sec, avg 18.357 ms, 54.475 ops/sec
ML-DSA 44 sign 16 ops took 1.133 sec, avg 70.812 ms, 14.122 ops/sec
ML-DSA 44 verify 52 ops took 1.031 sec, avg 19.827 ms, 50.436 ops/sec
ML-DSA 65 key gen 32 ops took 1.004 sec, avg 31.375 ms, 31.873 ops/sec
ML-DSA 65 sign 8 ops took 1.106 sec, avg 138.250 ms, 7.233 ops/sec
ML-DSA 65 verify 32 ops took 1.039 sec, avg 32.469 ms, 30.799 ops/sec
ML-DSA 87 key gen 20 ops took 1.051 sec, avg 52.550 ms, 19.029 ops/sec
ML-DSA 87 sign 10 ops took 1.039 sec, avg 103.900 ms, 9.625 ops/sec
ML-DSA 87 verify 20 ops took 1.082 sec, avg 54.100 ms, 18.484 ops/sec
4b6669b
to
4f7321c
Compare
Fixed inline assembly code. |
Add SHA-3 assembly implementation for Thumb2 and ARM32.
4f7321c
to
8734f12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
STM32H7A Cortex M7 at 240MHz:
SHA3-224 1 MiB took 1.004 seconds, 1.410 MiB/s
SHA3-256 1 MiB took 1.008 seconds, 1.332 MiB/s
SHA3-384 1 MiB took 1.016 seconds, 1.033 MiB/s
SHA3-512 750 KiB took 1.011 seconds, 741.840 KiB/s
SHAKE128 2 MiB took 1.012 seconds, 1.616 MiB/s
SHAKE256 1 MiB took 1.008 seconds, 1.332 MiB/s
ML-DSA 44 key gen 60 ops took 1.031 sec, avg 17.183 ms, 58.196 ops/sec
ML-DSA 44 sign 18 ops took 1.078 sec, avg 59.889 ms, 16.698 ops/sec
ML-DSA 44 verify 54 ops took 1.008 sec, avg 18.667 ms, 53.571 ops/sec
ML-DSA 65 key gen 36 ops took 1.051 sec, avg 29.194 ms, 34.253 ops/sec
ML-DSA 65 sign 12 ops took 1.094 sec, avg 91.167 ms, 10.969 ops/sec
ML-DSA 65 verify 34 ops took 1.035 sec, avg 30.441 ms, 32.850 ops/sec
ML-DSA 87 key gen 22 ops took 1.070 sec, avg 48.636 ms, 20.561 ops/sec
ML-DSA 87 sign 8 ops took 1.259 sec, avg 157.375 ms, 6.354 ops/sec
ML-DSA 87 verify 20 ops took 1.008 sec, avg 50.400 ms, 19.841 ops/sec
Description
Add SHA-3 assembly implementation for Thumb2 and ARM32.
Testing
QEMU with hosts: armv7m, armv8
Checklist