-
Notifications
You must be signed in to change notification settings - Fork 832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AES-GCM x86_64 MSVC ASM: XMM6-15 are non-volatile #6617
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Common mistake is the wrong offset for parameters which passed by stack.
These parameters are loaded before stack reservation for local purposes, so their offset should be the same, as it was before stack usage increase.
wolfcrypt/src/aes_gcm_asm.asm
Outdated
mov r15, QWORD PTR [rsp+136] | ||
mov r10d, DWORD PTR [rsp+144] | ||
sub rsp, 160 | ||
mov r8, QWORD PTR [rsp+256] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why +256 ?
it should be +96
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use 20 64-bit words of stack in the function for temporary storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sean,
It is loading parameters from the stack
Let's calculate offset for 5-th parameter:
- first four parameters = 32 bytes
- return address = 8 bytes
- seven non-volatile registers saving = 56 bytes
totally 96 bytes
so, 5-th parameter has offset 96 from RSP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I confused myself and treated the xmm registers to save as parameters.
Fix up now.
wolfcrypt/src/aes_gcm_asm.asm
Outdated
mov r10d, DWORD PTR [rsp+152] | ||
mov rbp, QWORD PTR [rsp+160] | ||
sub rsp, 168 | ||
mov r8, QWORD PTR [rsp+264] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again why +264 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, stack used for temporary storage.
wolfcrypt/src/aes_gcm_asm.asm
Outdated
mov r14d, DWORD PTR [rsp+288] | ||
mov r15, QWORD PTR [rsp+296] | ||
mov r10d, DWORD PTR [rsp+304] | ||
vmovdqu OWORD PTR [rsp+160], xmm6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why AVX version instead of SSE (movdqu) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
Generated code and it didn't know to produce SSE2 only code.
Fixed this.
wolfcrypt/src/aes_gcm_asm.asm
Outdated
mov r12, QWORD PTR [rsp+104] | ||
mov r14, QWORD PTR [rsp+112] | ||
vmovdqu OWORD PTR [rsp+16], xmm6 | ||
vmovdqu OWORD PTR [rsp+32], xmm7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about xmm8-14 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed generating code.
74398d5
to
a2aafe0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least you need
- change commands for all XMM saves from vmovdqu to movdqu
- check the parameter offsets, they should not be changed unless there are no new stack modifications between the start of the function and the loading of the parameters.
be careful about addressing parameters from the middle of the code, it should be increased if such addressing occurs after new stack modifications
wolfcrypt/src/aes_gcm_asm.asm
Outdated
mov r15, QWORD PTR [rsp+136] | ||
mov r10d, DWORD PTR [rsp+144] | ||
sub rsp, 160 | ||
mov r8, QWORD PTR [rsp+256] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sean,
It is loading parameters from the stack
Let's calculate offset for 5-th parameter:
- first four parameters = 32 bytes
- return address = 8 bytes
- seven non-volatile registers saving = 56 bytes
totally 96 bytes
so, 5-th parameter has offset 96 from RSP
a2aafe0
to
4dc3924
Compare
Fixed generation to put on stack and take off stack in the right place. |
Ok Last version is working with my friend's tests But, ASM code is not perfect:
|
Hi @ilka1999
Thanks, |
IMHO:
I did not check the stack alignment of the code, with alignment you can use ALIGNED loading/store instead of UNALIGNED |
Put XMM6-15, when used, on the stack at start of function and restore at end of function.
4dc3924
to
cfac603
Compare
Hi @ilka1999, Thanks for the feedback!
Note that the stack is not always aligned as I would like. On newer processors, unaligned moves are the same speed as the aligned moves. I won't be making changes for this but thank you for bringing it up. Sean |
Tested successfully |
Description
Put XMM6-15, when used, on the stack at start of function and restore at end of function.
Fixes #6608
Testing
Standard
Checklist