mbedTLS AES code, intrinsics vs. assembly, alignment #5593

magnumripper · 2024-11-30T01:11:46Z

I believe it builds asm for now. Brief tests with intrinsics show a slight performance drop. From PR comments:

I notice there are pieces of inline asm code in mbedTLS, which use non-VEX SSE instructions. Hopefully this works OK, but there's risk of it being slow (or of VEX-encoded code being slow afterwards) without vzeroupper on transitions (which would also be slow, just without the risk of being an order of magnitude slower). I don't suggest changing this yet, just writing down this note.

@magnumripper said:

 * \note AESNI is only supported with certain compilers and target options:
 * - Visual Studio: supported
 * - GCC, x86-64, target not explicitly supporting AESNI:
 *   requires MBEDTLS_HAVE_ASM.
 * - GCC, x86-32, target not explicitly supporting AESNI:
 *   not supported.
 * - GCC, x86-64 or x86-32, target supporting AESNI: supported.
 *   For this assembly-less implementation, you must currently compile
 *   `library/aesni.c` and `library/aes.c` with machine options to enable
 *   SSE2 and AESNI instructions: `gcc -msse2 -maes -mpclmul` or
 *   `clang -maes -mpclmul`.
 * - Non-x86 targets: this option is silently ignored.
 * - Other compilers: this option is silently ignored.
 *
 * \note
 * Above, "GCC" includes compatible compilers such as Clang.
 * The limitations on target support are likely to be relaxed in the future.
Perhaps we do need some tweak to ensure intrinsics and not asm, but I did just now manually build with -mavx2 -maes -mpclmul per above, and that resulted in a 62% larger aes.a and definitely worse performance (not a lot, but worse).

Disregarding the performance drop, we do have @CC_CPU@ from configure.ac to put in Makefile.in (will add -mavx2 for my laptop) but the -maes -mpclmul would need to be added too. I assume those two can be added even for machines not supporting it (bc cpuid checking) but it would need testing, and obviously can't be blindly added - the machine could be a Sparc and/or the compiler could be one that hasn't got a clue what those options are.

The text was updated successfully, but these errors were encountered:

solardiz · 2024-11-30T03:22:44Z

Upstream is also considering dropping the assembly in favor of intrinsics: Mbed-TLS/mbedtls#8231

solardiz · 2024-12-01T00:15:02Z

Experiment 1:

I tried adding -maes -mpclmul to CFLAGS inside the mbedtls directory. The aesni.o text section became about twice larger (from 1445 to 3168 bytes). Speeds have changed: o5logon became 5% slower, but keepass 25% faster with intrinsics.

This is without usage of VEX yet, even though most of the rest of john is built with AVX512BW. Apparently, we're not propagating the main SIMD flags from the main CFLAGS to this sub-make - maybe a bug on its own.

Experiment 2:

On top of the above, added also -mavx512bw to CFLAGS inside the mbedtls directory. The aesni.o text section's size is now in between asm and the above, at 2709 bytes, and there are indeed VEX-encoded instructions in there (including vaes*). o5logon speed is also in between asm and the experiment above (so it's partially recovered), keepass is as good as the above.

solardiz · 2024-12-01T00:23:26Z

Apparently, we're not propagating the main SIMD flags from the main CFLAGS to this sub-make - maybe a bug on its own.

Looks the same for other subdirectories we have with third-party crypto code that we build into .a files.
Separately but related, we also have non-VEX inline asm SIMD code in ed25519-donna/ed25519-donna-64bit-x86.h.

This is for x86 AES-NI, intrinsics version. Closes openwall#5593

magnumripper · 2024-12-02T01:45:25Z

For legacy makes, perhaps we could just add -maes -mpclmul for AVX and beyond? Or just ignore it and rely on the asm. The -native targets will get it automagically of course.

solardiz · 2024-12-02T01:53:15Z

For legacy makes, perhaps we could just add -maes -mpclmul for AVX and beyond?

I've just checked - Intel Sandy Bridge - the first microarch to introduce AVX - already had both of these as well. So I was tempted to say yes.

But then it gets interesting - it was also a time of paranoia about possible backdoors in AES-NI, which is probably why Intel added a way to disable AES-NI from the firmware (including in a way that this can't be re-enabled later on a live system), and many systems shipped with it default-disabled (and default-locked)! I still have a server like this where I cannot easily re-enable AES-NI remotely. I wonder what this means not only for our Makefile.legacy, but also for our regular autoconf'ed builds. I may check.

magnumripper · 2024-12-02T01:55:32Z

Provided the mbedTLS code checks cpuid sufficiently, it's more a matter of the compiler supporting these flags. A compiler supporting AVX should also support -maes.