Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RISC-V ChaCha20: assembly implementations #7818

Merged
merged 2 commits into from
Aug 2, 2024

Conversation

SparkiDev
Copy link
Contributor

@SparkiDev SparkiDev commented Aug 1, 2024

Description

ChaCha20:
scalar and vector implementations
vector implementations doing 6, 4, 2, 1 block at a time.
scalar implemetations using roriw and pack
vector implementations using VROR_VI and roriw.

RISC-V SHA-256: avoid using s0 if it can be helped.

Testing

./configure '--disable-shared' '--enable-chacha' 'LDFLAGS=--static' '--host=riscv64' 'CC=riscv64-linux-gnu-gcc' '--enable-riscv-asm'
./configure '--disable-shared' '--enable-chacha' 'LDFLAGS=--static' '--host=riscv64' 'CC=riscv64-linux-gnu-gcc' '--enable-riscv-asm=zbb'
./configure '--disable-shared' '--enable-chacha' 'LDFLAGS=--static' '--host=riscv64' 'CC=riscv64-linux-gnu-gcc' '--enable-riscv-asm=zbkb'
./configure '--disable-shared' '--enable-chacha' 'LDFLAGS=--static' '--host=riscv64' 'CC=riscv64-linux-gnu-gcc' '--enable-riscv-asm=zv'
./configure '--disable-shared' '--enable-chacha' 'LDFLAGS=--static' '--host=riscv64' 'CC=riscv64-linux-gnu-gcc' '--enable-riscv-asm=zv,zvbb'
./configure '--disable-shared' '--enable-chacha' 'LDFLAGS=--static' '--host=riscv64' 'CC=riscv64-linux-gnu-gcc' '--enable-riscv-asm=zv,zvbb,zbb'
./configure '--disable-shared' '--enable-chacha' 'LDFLAGS=--static' '--host=riscv64' 'CC=riscv64-linux-gnu-gcc' '--enable-riscv-asm=zv,zvbb,zbkb'

Checklist

  • added tests
  • updated/added doxygen
  • updated appropriate READMEs
  • Updated manual and documentation

@SparkiDev SparkiDev self-assigned this Aug 1, 2024
ChaCha20:
  scalar and vector implementations
  vector implementations doing 6, 4, 2, 1 block at a time.
  scalar implemetations using roriw and pack
  vector implementations using VROR_VI and roriw.

RISC-V SHA-256: avoid using s0 if it can be helped.
@SparkiDev
Copy link
Contributor Author

retest this please

@SparkiDev SparkiDev assigned wolfSSL-Bot and unassigned SparkiDev Aug 1, 2024
@dgarske dgarske requested review from dgarske and removed request for wolfSSL-Bot August 1, 2024 16:06
@dgarske dgarske self-assigned this Aug 1, 2024
@dgarske
Copy link
Contributor

dgarske commented Aug 1, 2024

HiFive Unleashed at 1.4GHz:

root@HiFiveU:~/wolfssl# ./wolfcrypt/benchmark/benchmark
------------------------------------------------------------------------------
 wolfSSL version 5.7.2
------------------------------------------------------------------------------
Math: 	Multi-Precision: Wolf(SP) word-size=64 bits=3072 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG                         10 MiB took 1.358 seconds,    7.363 MiB/s
AES-128-CBC-enc             20 MiB took 1.089 seconds,   18.362 MiB/s
AES-128-CBC-dec             20 MiB took 1.091 seconds,   18.334 MiB/s
AES-192-CBC-enc             20 MiB took 1.249 seconds,   16.011 MiB/s
AES-192-CBC-dec             20 MiB took 1.262 seconds,   15.851 MiB/s
AES-256-CBC-enc             15 MiB took 1.068 seconds,   14.039 MiB/s
AES-256-CBC-dec             15 MiB took 1.067 seconds,   14.064 MiB/s
AES-128-GCM-enc             15 MiB took 1.315 seconds,   11.409 MiB/s
AES-128-GCM-dec             15 MiB took 1.313 seconds,   11.423 MiB/s
AES-192-GCM-enc             15 MiB took 1.441 seconds,   10.409 MiB/s
AES-192-GCM-dec             15 MiB took 1.440 seconds,   10.414 MiB/s
AES-256-GCM-enc             10 MiB took 1.045 seconds,    9.572 MiB/s
AES-256-GCM-dec             10 MiB took 1.041 seconds,    9.609 MiB/s
GMAC Table 4-bit            31 MiB took 1.003 seconds,   30.903 MiB/s
CHACHA                      35 MiB took 1.165 seconds,   30.041 MiB/s
CHA-POLY                    25 MiB took 1.122 seconds,   22.291 MiB/s
MD5                         75 MiB took 1.010 seconds,   74.257 MiB/s
POLY1305                    90 MiB took 1.041 seconds,   86.491 MiB/s
SHA                         35 MiB took 1.063 seconds,   32.941 MiB/s
SHA-256                     20 MiB took 1.095 seconds,   18.263 MiB/s
SHA-384                     25 MiB took 1.153 seconds,   21.687 MiB/s
SHA-512                     25 MiB took 1.151 seconds,   21.725 MiB/s
SHA-512/224                 25 MiB took 1.151 seconds,   21.723 MiB/s
SHA-512/256                 25 MiB took 1.149 seconds,   21.750 MiB/s
HMAC-MD5                    75 MiB took 1.010 seconds,   74.267 MiB/s
HMAC-SHA                    35 MiB took 1.062 seconds,   32.943 MiB/s
HMAC-SHA256                 20 MiB took 1.094 seconds,   18.280 MiB/s
HMAC-SHA384                 25 MiB took 1.150 seconds,   21.740 MiB/s
HMAC-SHA512                 25 MiB took 1.151 seconds,   21.723 MiB/s
PBKDF2                       2 KiB took 1.002 seconds,    2.276 KiB/s
RSA     2048   public      1500 ops took 1.063 sec, avg 0.709 ms, 1411.106 ops/sec
RSA     2048  private       100 ops took 4.401 sec, avg 44.008 ms, 22.723 ops/sec
DH      2048  key gen       115 ops took 1.002 sec, avg 8.712 ms, 114.789 ops/sec
DH      2048    agree       100 ops took 1.849 sec, avg 18.494 ms, 54.071 ops/sec
ECC   [      SECP256R1]   256  key gen       200 ops took 1.282 sec, avg 6.408 ms, 156.044 ops/sec
ECDHE [      SECP256R1]   256    agree       200 ops took 1.276 sec, avg 6.380 ms, 156.750 ops/sec
ECDSA [      SECP256R1]   256     sign       200 ops took 1.314 sec, avg 6.571 ms, 152.173 ops/sec
ECDSA [      SECP256R1]   256   verify       300 ops took 1.313 sec, avg 4.378 ms, 228.426 ops/sec
Benchmark complete

For reference here are the numbers without riscv-asm:

root@HiFiveU:~/wolfssl# ./wolfcrypt/benchmark/benchmark
------------------------------------------------------------------------------
 wolfSSL version 5.7.2
------------------------------------------------------------------------------
Math: 	Multi-Precision: Wolf(SP) word-size=64 bits=3072 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG                         10 MiB took 1.359 seconds,    7.356 MiB/s
AES-128-CBC-enc              5 MiB took 12.714 seconds,    0.393 MiB/s
AES-128-CBC-dec              5 MiB took 12.659 seconds,    0.395 MiB/s
AES-192-CBC-enc              5 MiB took 15.219 seconds,    0.329 MiB/s
AES-192-CBC-dec              5 MiB took 15.162 seconds,    0.330 MiB/s
AES-256-CBC-enc              5 MiB took 17.748 seconds,    0.282 MiB/s
AES-256-CBC-dec              5 MiB took 17.674 seconds,    0.283 MiB/s
AES-128-GCM-enc              5 MiB took 12.812 seconds,    0.390 MiB/s
AES-128-GCM-dec              5 MiB took 12.810 seconds,    0.390 MiB/s
AES-192-GCM-enc              5 MiB took 15.329 seconds,    0.326 MiB/s
AES-192-GCM-dec              5 MiB took 15.326 seconds,    0.326 MiB/s
AES-256-GCM-enc              5 MiB took 17.832 seconds,    0.280 MiB/s
AES-256-GCM-dec              5 MiB took 17.831 seconds,    0.280 MiB/s
GMAC Table 4-bit            31 MiB took 1.018 seconds,   30.461 MiB/s
CHACHA                      35 MiB took 1.146 seconds,   30.548 MiB/s
CHA-POLY                    25 MiB took 1.108 seconds,   22.571 MiB/s
MD5                         75 MiB took 1.009 seconds,   74.349 MiB/s
POLY1305                    90 MiB took 1.040 seconds,   86.501 MiB/s
SHA                         35 MiB took 1.062 seconds,   32.961 MiB/s
SHA-256                     20 MiB took 1.091 seconds,   18.327 MiB/s
SHA-384                     25 MiB took 1.127 seconds,   22.187 MiB/s
SHA-512                     25 MiB took 1.127 seconds,   22.183 MiB/s
SHA-512/224                 25 MiB took 1.126 seconds,   22.202 MiB/s
SHA-512/256                 25 MiB took 1.126 seconds,   22.200 MiB/s
HMAC-MD5                    75 MiB took 1.009 seconds,   74.361 MiB/s
HMAC-SHA                    35 MiB took 1.062 seconds,   32.963 MiB/s
HMAC-SHA256                 20 MiB took 1.091 seconds,   18.327 MiB/s
HMAC-SHA384                 25 MiB took 1.126 seconds,   22.202 MiB/s
HMAC-SHA512                 25 MiB took 1.126 seconds,   22.198 MiB/s
PBKDF2                       2 KiB took 1.001 seconds,    2.280 KiB/s
RSA     2048   public      1500 ops took 1.062 sec, avg 0.708 ms, 1411.948 ops/sec
RSA     2048  private       100 ops took 4.402 sec, avg 44.019 ms, 22.718 ops/sec
DH      2048  key gen       115 ops took 1.002 sec, avg 8.712 ms, 114.788 ops/sec
DH      2048    agree       100 ops took 1.846 sec, avg 18.457 ms, 54.179 ops/sec
ECC   [      SECP256R1]   256  key gen       200 ops took 1.286 sec, avg 6.431 ms, 155.503 ops/sec
ECDHE [      SECP256R1]   256    agree       200 ops took 1.282 sec, avg 6.411 ms, 155.976 ops/sec
ECDSA [      SECP256R1]   256     sign       200 ops took 1.316 sec, avg 6.579 ms, 152.010 ops/sec
ECDSA [      SECP256R1]   256   verify       300 ops took 1.356 sec, avg 4.519 ms, 221.297 ops/sec
Benchmark complete

dgarske
dgarske previously approved these changes Aug 1, 2024
@SparkiDev
Copy link
Contributor Author

Annoying. All that work and a slight regression!
QEMU says it is a lot faster but I guess it doesn't reflect real performance at all!
I'll work on the PR some more and hopefully get some extra performance.

@dgarske dgarske assigned SparkiDev and unassigned dgarske and wolfSSL-Bot Aug 1, 2024
@SparkiDev
Copy link
Contributor Author

SparkiDev commented Aug 2, 2024

If this is works and isn't slower than before then please merge.
Let us know what the performance is. Thanks!

@SparkiDev SparkiDev assigned dgarske and wolfSSL-Bot and unassigned SparkiDev Aug 2, 2024
@dgarske
Copy link
Contributor

dgarske commented Aug 2, 2024

Updated benchmarks:

SHA256 17% faster.
ChaCha 77% faster.

root@HiFiveU:~/wolfssl# ./wolfcrypt/benchmark/benchmark
------------------------------------------------------------------------------
 wolfSSL version 5.7.2
------------------------------------------------------------------------------
Math: 	Multi-Precision: Wolf(SP) word-size=64 bits=3072 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG                         10 MiB took 1.057 seconds,    9.463 MiB/s
AES-128-CBC-enc             20 MiB took 1.067 seconds,   18.738 MiB/s
AES-128-CBC-dec             20 MiB took 1.070 seconds,   18.691 MiB/s
AES-192-CBC-enc             20 MiB took 1.238 seconds,   16.151 MiB/s
AES-192-CBC-dec             20 MiB took 1.246 seconds,   16.054 MiB/s
AES-256-CBC-enc             15 MiB took 1.050 seconds,   14.289 MiB/s
AES-256-CBC-dec             15 MiB took 1.053 seconds,   14.243 MiB/s
AES-128-GCM-enc             15 MiB took 1.302 seconds,   11.519 MiB/s
AES-128-GCM-dec             15 MiB took 1.302 seconds,   11.525 MiB/s
AES-192-GCM-enc             15 MiB took 1.432 seconds,   10.472 MiB/s
AES-192-GCM-dec             15 MiB took 1.429 seconds,   10.494 MiB/s
AES-256-GCM-enc             10 MiB took 1.033 seconds,    9.679 MiB/s
AES-256-GCM-dec             10 MiB took 1.033 seconds,    9.680 MiB/s
GMAC Table 4-bit            31 MiB took 1.003 seconds,   30.893 MiB/s
CHACHA                      40 MiB took 1.011 seconds,   39.567 MiB/s
CHA-POLY                    30 MiB took 1.105 seconds,   27.161 MiB/s
MD5                         75 MiB took 1.010 seconds,   74.231 MiB/s
POLY1305                    90 MiB took 1.041 seconds,   86.421 MiB/s
SHA                         35 MiB took 1.062 seconds,   32.952 MiB/s
SHA-256                     25 MiB took 1.167 seconds,   21.427 MiB/s
SHA-384                     25 MiB took 1.146 seconds,   21.813 MiB/s
SHA-512                     25 MiB took 1.146 seconds,   21.816 MiB/s
SHA-512/224                 25 MiB took 1.147 seconds,   21.790 MiB/s
SHA-512/256                 25 MiB took 1.148 seconds,   21.779 MiB/s
HMAC-MD5                    75 MiB took 1.008 seconds,   74.381 MiB/s
HMAC-SHA                    35 MiB took 1.062 seconds,   32.955 MiB/s
HMAC-SHA256                 25 MiB took 1.166 seconds,   21.437 MiB/s
HMAC-SHA384                 25 MiB took 1.148 seconds,   21.773 MiB/s
HMAC-SHA512                 25 MiB took 1.147 seconds,   21.801 MiB/s
PBKDF2                       3 KiB took 1.002 seconds,    2.651 KiB/s
RSA     2048   public      1500 ops took 1.050 sec, avg 0.700 ms, 1428.257 ops/sec
RSA     2048  private       100 ops took 4.399 sec, avg 43.992 ms, 22.731 ops/sec
DH      2048  key gen       115 ops took 1.005 sec, avg 8.741 ms, 114.405 ops/sec
DH      2048    agree       100 ops took 1.848 sec, avg 18.477 ms, 54.122 ops/sec
ECC   [      SECP256R1]   256  key gen       200 ops took 1.281 sec, avg 6.406 ms, 156.103 ops/sec
ECDHE [      SECP256R1]   256    agree       200 ops took 1.276 sec, avg 6.378 ms, 156.792 ops/sec
ECDSA [      SECP256R1]   256     sign       200 ops took 1.309 sec, avg 6.544 ms, 152.815 ops/sec
ECDSA [      SECP256R1]   256   verify       300 ops took 1.313 sec, avg 4.377 ms, 228.485 ops/sec
Benchmark complete

@dgarske dgarske merged commit b12a773 into wolfSSL:master Aug 2, 2024
125 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants