Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to change SVE vector length for current and children processes #101295
Add option to change SVE vector length for current and children processes #101295
Changes from 1 commit
1393c30
2c040a7
19dce8d
22aa47e
43ece1e
9140d82
aca2809
1d14227
23dda80
c080aad
a0c5247
4c0ffa7
aaeb733
8b07210
c0a6910
528e157
63a2be8
ec0c317
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this restricted to DEBUG only?
I also expect that we need a path for Windows and should likely treat SVE as unsupported if the vector length is larger and restricting the size (via
prctl
or equivalent on other OS) fails.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option is added for development purpose only. It's primarily added to aid implementation of API, and its testing on a 256bit SVE enabled V1 system that we have access to. When Vector would use the entire vector length, it may become redundant and be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it being for development purposes only is a good thing long term and it probably doesn't match up with the intent of
DOTNET_MaxVectorTBitWidth
.We don't really want to disable SVE on hardware with 256-bit vectors just because a user has said they want 128-bit vectors, using the OS feature like
prctl
should be a much better option so that they can still get SVE usage in that scenario.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check with Windows team on what is the equivalent API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kunal asked me to share how this works in Windows (I developed the SVE support in Windows for the next release). We felt that being able to dynamically change the vector length at runtime would be messy, since there's no save/restore mechanism for these vector length changes, hence you could get into situations where for example some code calls into a library, the library decreases the VL for some reason, but then has a bug where under certain conditions it fails to restore the vector length, so after the caller gains control again it would be running with a decreased vector length indefinitely, etc.
We do recognize the need to change the vector length for various purposes (testing, perf, compat, etc), so we did add a CreateProcess parameter that allows the vector length to be specified during process creation, but after a process has been created, its vector length cannot be changed. There's also registry keys that can be set to change the vector length on a per-process basis (IFEO settings), or for all processes in the system, but these registry-based methods are typically used for development and testing only.
By default, we'll use the highest VL supported by the system and the underlying hypervisor. So the above mechanisms will only be necessary when you want to decrease the VL for processes.
Happy to explore this topic more with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context @JasonLinMS.
I definitely agree that changing this during the lifetime of the process is messy/problematic and the intent in general is to not allow that for .NET either.
I think it's probably worth us having a short meeting (.NET, Windows, and Arm) to see if we can discuss things here and see if we can find something that generally works for everyone. CC. @jkotas
In the case of .NET, we would really only need the capability to set this once as part of our own startup before any user code has executed. We don't have any intent to change this dynamically (although there is the SME feature that makes this a little bit more complicated) and the ideal for user code is to write size agnostic algorithms so that it doesn't matter what size the hardware actually supports.
However, the official Arm64 SVE/Vector ABIs (https://github.com/ARM-software/abi-aa?tab=readme-ov-file#abi-for-the-arm-64-bit-architecture-with-sve-support) do define the ability to say a given API expects a particular size and there may be cases where a user needs to fix it themselves (potentially just for testing purposes or giving users a workaround for a bug). As such, the ability for a process to set the size for itself is beneficial, especially if that can be shared across all Arm64 capable operating systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a meeting to discuss this sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scheduled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once other feedback is addressed, this change can go in without having us to wait for Windows support.
I think this should also check the other requirements such as it should by 128-bit increments only and maximum should be 2048 and such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very sensible choice. In Linux, changing the vector length inside a library and not restoring it before returning is generally seen as undefined behaviour. OpenJDK doesn't want to trust that, and after calling external routines it inserts checks to confirm the vector length hasn't changed. Neither of which is ideal.
The PCS isn't clear if the vector length must remain fixed. I've raised a bug against it here. It would be good to have a clear statement.
For coreclr, if changing the vector length is only for debugging then a wrapper script/binary that just launches coreclr with the correct vector length might be good enough. That would work for Windows and Linux. Ideally I'd still like to keep this PRs mechanism for use in Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will set the SVE vector length only for the current thread only. It won't set it for threads that has been started already. Is that correct?
How does it work for new threads? Do new threads inherit SVE vector length of the parent thread or do new threads inherit SVE vector length of the process?
Are we sure that none of the libraries that has been initialized by this point have not cached the vector length?
https://www.man7.org/linux/man-pages/man2/prctl.2.html has explicit warning about use of this API only if you know what you are doing.
It seems that allowing the SVE vector length to be set only before process start is the only reliable option. Setting it here may break all sorts of random things. I am not convinced that we know what we are doing by setting it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean in the case of something like the C runtime or in the case of something else hosting the CLR?
Are you thinking the only valid thing for us to do here is to fail to launch for an SVE mismatch (AOT) and to just disable SVE usage otherwise (JIT)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is obviously not compatible with hosting. (We have no reliable way to tell whether we are hosted.)
Even without hosting and external libraries outside our control in the picture, there are number of our own threads created in the process by this point (PAL, EventPipe) that will have the wrong size configured. It is hard to guarantee that none of these threads is going to wander into managed code.
I do not see a better option. It suggests that the design we are working with is questionable since it does not work well for AOT.
Yes, if the JIT is not able to accommodate the SVE length that the process was started with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. There shouldn't be any other threads running at this point? (it's possible this code is called much later than I expected or there are hosting scenarios I'm not familiar with)
It will use the parent thread vector length.
All other SVE state of a thread, including the currently configured vector length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector length (if any), is preserved across all syscalls, subject to the specific exceptions for execve() described in section 6.
In particular, on return from a fork() or clone(), the parent and new child process or thread share identical SVE configuration, matching that of the parent before the call.
sve.rst
Assuming this is happening early in the process, there should be no use of SVE at all yet. Vector length can be easily read using an instruction (eg
CNT
). But we can't guarantee what a library might do.If this is only used for debugging/testing and never used in production, then I lean towards it's fine. Anything more then maybe not.
Or should
DOTNET_MaxVectorTBitWidth
be X86 only?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general these switches are primarily there for debugging/testing purposes. However, they also generally exist as a way to disable or limit intrinsic support if a library is found to have a blocking bug.
It's not great that we can't help setup the process to achieve success, but its also not the end of the world and is something we can ideally give user guidance around.
I think it's fine for us to respect it still, that's functionally what AOT compiled for a particular SVE size would have to do after all. It's just a different way to disable SVE support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documented switches have to be reliable.
DOTNET_MaxVectorTBitWidth
is documented switch.It does not sound like that this switch can be reliable. It means that it should have different name, and ideally be a debug-only switch. We do not want to be dealing with inscrutable crashes caused the different parts of the process being configured to different vector sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still reliable and working as documented, even with this change in direction. The switch was intentionally named
MaxVectorTBitWidth
because such complications could exist. All that's changed is that instead of us settingsizeof(Vector<T>)
based onmin(SveLength, DOTNET_MaxVectorTBitWidth)
, we simply set it based on(SveLength > DOTNET_MaxVectorTBitWidth) ? 16 : SveLength
.So, this minor change in direction is really no different than us limiting the maximum bit width to 256 by default on x64 hardware or not considering 512-bits on certain first gen AVX512 hardware unless the users also opt into a hidden undocumented switch.
Which is to say, it still simply represents the maximum size a user wants to support (defaulting to
0
which means the system can decide). It can be smaller if the system doesn't support the size specified.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As proposed in this PR, it does more than just setting the
sizeof(Vector<T>)
.PR_SVE_SET_VL
call makes it unreliable.It impacts the global state of the thread and process in a way that may be incompatible with other components in the process. It is what makes it unrealizable. It is guaranteed to be broken for CoreCLR hosting scenarios, and it may have issues without hosting too based on the documentation. It is very hard to audit what is loaded in the process.
I agree that it would be ok if the switch set
sizeof(Vector<T>)
only without callingPR_SVE_SET_VL
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I should have clarified I meant that given your input that we shouldn't take this PR to call
prcrtl
because its unreliable, that the alternative where we simply just don't use SVE if its larger than theDOTNET_MaxVectorTSize
is fine and still inline with the currently documented behavior for that config switch.If we provided anything around
prctl
(and it sounds like we're leaning towardsno
), it would need to be a separate undocumented switch, potentially debug only. -- I don't think we have the need to add that given our current testing needs and the known sizes (128 and 256) we'll want to support for existing SVE capable hardware (both consumer and server/cloud).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tannergooding for clarifying some of the things offline, so it eventually boils down to:
To implement
Get_System_VL()
, we can useprctl PR_SVE_GET_VL
, it will be good to useCNTB
or an equivalent instruction because that way, it will be OS agnostic. We will need that anyway forgetVectorTByteLength()
whenDOTNET_MaxVectorTBitWidth
is not set.With that said, there is no need to introduce a different environment variable that is
DEBUG
only to support the downgrade scenario.