Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AES-GCM x86_64 MSVC ASM: XMM6-15 are non-volatile #6617

Merged
merged 1 commit into from
Jul 31, 2023

Conversation

SparkiDev
Copy link
Contributor

Description

Put XMM6-15, when used, on the stack at start of function and restore at end of function.

Fixes #6608

Testing

Standard

Checklist

  • added tests
  • updated/added doxygen
  • updated appropriate READMEs
  • Updated manual and documentation

Copy link

@ilka1999 ilka1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common mistake is the wrong offset for parameters which passed by stack.
These parameters are loaded before stack reservation for local purposes, so their offset should be the same, as it was before stack usage increase.

mov r15, QWORD PTR [rsp+136]
mov r10d, DWORD PTR [rsp+144]
sub rsp, 160
mov r8, QWORD PTR [rsp+256]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why +256 ?
it should be +96

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use 20 64-bit words of stack in the function for temporary storage.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sean,

It is loading parameters from the stack

Let's calculate offset for 5-th parameter:

  1. first four parameters = 32 bytes
  2. return address = 8 bytes
  3. seven non-volatile registers saving = 56 bytes

totally 96 bytes
so, 5-th parameter has offset 96 from RSP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I confused myself and treated the xmm registers to save as parameters.
Fix up now.

mov r10d, DWORD PTR [rsp+152]
mov rbp, QWORD PTR [rsp+160]
sub rsp, 168
mov r8, QWORD PTR [rsp+264]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again why +264 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, stack used for temporary storage.

mov r14d, DWORD PTR [rsp+288]
mov r15, QWORD PTR [rsp+296]
mov r10d, DWORD PTR [rsp+304]
vmovdqu OWORD PTR [rsp+160], xmm6

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why AVX version instead of SSE (movdqu) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
Generated code and it didn't know to produce SSE2 only code.
Fixed this.

mov r12, QWORD PTR [rsp+104]
mov r14, QWORD PTR [rsp+112]
vmovdqu OWORD PTR [rsp+16], xmm6
vmovdqu OWORD PTR [rsp+32], xmm7

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about xmm8-14 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed generating code.

Copy link

@ilka1999 ilka1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least you need

  1. change commands for all XMM saves from vmovdqu to movdqu
  2. check the parameter offsets, they should not be changed unless there are no new stack modifications between the start of the function and the loading of the parameters.

be careful about addressing parameters from the middle of the code, it should be increased if such addressing occurs after new stack modifications

mov r15, QWORD PTR [rsp+136]
mov r10d, DWORD PTR [rsp+144]
sub rsp, 160
mov r8, QWORD PTR [rsp+256]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sean,

It is loading parameters from the stack

Let's calculate offset for 5-th parameter:

  1. first four parameters = 32 bytes
  2. return address = 8 bytes
  3. seven non-volatile registers saving = 56 bytes

totally 96 bytes
so, 5-th parameter has offset 96 from RSP

@SparkiDev
Copy link
Contributor Author

Fixed generation to put on stack and take off stack in the right place.

@ilka1999
Copy link

Ok

Last version is working with my friend's tests

But, ASM code is not perfect:

  1. there are some oddish lines
  2. no proc frames

@SparkiDev
Copy link
Contributor Author

Hi @ilka1999

  1. Let me know what is odds in the lines. Specifics or generally.
  2. Are you suggesting adding 'FRAMES' to the PROC lines?

Thanks,
Sean

@ilka1999
Copy link

IMHO:

  1. the code still uses vmovdqu instead of movdqu
  2. savings of XMM registers which does not changed
  3. allocates stack for temporary local variables in the middle of body

I did not check the stack alignment of the code, with alignment you can use ALIGNED loading/store instead of UNALIGNED

Put XMM6-15, when used, on the stack at start of function and restore at
end of function.
@SparkiDev
Copy link
Contributor Author

SparkiDev commented Jul 25, 2023

Hi @ilka1999,

Thanks for the feedback!

  1. I've pushed an update to fix this.
  2. This has also been fixed.
  3. The temporary stack allocations are not something I intend to change right now.

Note that the stack is not always aligned as I would like. On newer processors, unaligned moves are the same speed as the aligned moves. I won't be making changes for this but thank you for bringing it up.

Sean

@ilka1999
Copy link

Tested successfully

@JacobBarthelmeh JacobBarthelmeh merged commit 4b80dcf into wolfSSL:master Jul 31, 2023
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: aes_gcm_asm.asm doesn't support Windows x86-64 calling convention
3 participants