-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure the scalar Reciprocal*Estimate APIs use AVX512 where possible #101800
Conversation
Some of the failures linked in those issues were for other than the reciprocal estimates, e.g.
so this might not fully resolve it. |
…ate to the JIT so R2R works
// isMagnitude - true if the intrinsic compares using the absolute value of the inputs | ||
// isNumber - true if the intrinsic propagates the number; false for NaN | ||
// | ||
GenTree* Compiler::impEstimateIntrinsic(CORINFO_METHOD_HANDLE method, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC. @dotnet/jit-contrib, this did end up being a runtime issue of sorts.
There's a general assumption made by R2R that the managed implementation (for Corelib in particular) will be consistent across all paths. That was strictly not the case for the *Estimate
APIs where they are non-deterministic across hardware by design. -- Put another way, R2R
essentially assumes that Isa.IsSupported
is equivalent to compOpportunisticallyDependsOn(Isa)
for corelib, but it assumes its equivalent to compExactlyDependsOn(Isa)
for other libraries.
As such, the only fix we have today is to move it into the JIT where we can explicitly use compExactlyDependsOn(Isa)
instead and force things to be annotated correctly. This has a minor added bonus in that it improves codegen around the method slightly for more complex scenarios and makes it more consistent with the handling of other math APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be solved via ExactlyDependsOn attributes on C# side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not from what I saw. The analyzer will actually warn in the case that you try to do something like that, because of the expectation that the paths are semantically equivalent.
It's detailed more in depth here with some explanations of what is safe vs not safe for corelib:
- Any use of a platform intrinsic in the codebase MUST be wrapped with a call to an associated IsSupported property. This wrapping MUST be done within the same function that uses the hardware intrinsic, OR the function which uses the platform intrinsic must have the `CompExactlyDependsOn` attribute used to indicate that this function will unconditionally call platform intrinsics of from some type. |
Doing it in the JIT, which is how all the other APIs that map down to 1 instruction are handled, ended up being the simpler approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding so does this PR solve the non-determinism? R2R still may prejit functions with these math functions, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so does this PR solve the non-determinism
No, the non-determinism is intentional for these APIs. They are explicitly estimating
APIs that are allowed to differ in result based on the hardware they're running on. What this does fix is the non-determinism within a single process.
R2R still may prejit functions with these math functions, right?
The use of compExactlyDependsOn
basically tells R2R that this method requires Avx512F
and if it isn't present the functionality may differ. That will mark methods that call it as requiring themselves to be jitted.
If we made these recursive
and therefore mustExpand
, then R2R could leave a call to the method and only have the method jitted instead. But, that's a more complicated change since ReciprocalSqrtEstimate
requires us to emit a call to System.Math.Sqrt
in the worst case and requires adding handling for that to platforms that don't currently have hardware intrinsic support (like Arm32).
This is a relatively uncommon API (basically just used in specialized high-perf scenarios), so fixing the failing CI legs seemed more important than worrying about the handful of APIs that would be disqualified from R2R (none of which are in corelib or other core libraries, System.Numerics.Tensors
is the only place in our own code).
Improvements on arm64: |
…otnet#101800) * Ensure the scalar Reciprocal*Estimate APIs use AVX512 where possible * Ensure that TensorPrimitives.Reciprocal APIs consistently use AVX512 * Move the implementation of ReciprocalEstimate and ReciprocalSqrtEstimate to the JIT so R2R works * Ensure the relevant math intrinsics are marked betterToExpand * Apply formatting patch * Fixing the method header for impEstimateIntrinsic * Don't use the AVX512 paths for TensorPrimitives.Reciprocal*Estimate APIs on .NET 8 * Workaround an issue where the TensorPrimitives net8 tests are not running on net8
…otnet#101800) * Ensure the scalar Reciprocal*Estimate APIs use AVX512 where possible * Ensure that TensorPrimitives.Reciprocal APIs consistently use AVX512 * Move the implementation of ReciprocalEstimate and ReciprocalSqrtEstimate to the JIT so R2R works * Ensure the relevant math intrinsics are marked betterToExpand * Apply formatting patch * Fixing the method header for impEstimateIntrinsic * Don't use the AVX512 paths for TensorPrimitives.Reciprocal*Estimate APIs on .NET 8 * Workaround an issue where the TensorPrimitives net8 tests are not running on net8
This resolves #98732, resolves #101721, and resolves #101731
The general issue was that the
TensorPrimitives
APIs used AVX512 or SSE, depending on availability while the scalar APIs only used SSE. -- This was not reproducing everywhere as it only reproduces on AVX512 capable hardware, which not all CI machines are.There is a subtle difference between the handling of denormalized values between the AVX512 and SSE instructions where AVX512 treats them negatives while SSE treats them as flushed to zero, which in turn causes a difference in result. This difference is fine and expected since this is explicitly an
Estimate
API and doesn't impact calls within a single process.