-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major performance regression of the array initialization pattern #62864
Comments
The IL for the method in question looks like this - there's a GC::AllocateUninitializedArray instead of newarr. This is breaking pattern recognition in RyuJIT. In NativeAOT this is a Cc @jaredpar to make sure this is intentional new codegen for this. RyuJIT will need to be adapted to recognize the new pattern.
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsIn dotnet/performance#2534 I've bumped BDN version to see if @MichalStrehovsky fix (dotnet/BenchmarkDotNet#2046) helped. It did, but new issue popped out. 86 out of 4334 benchmarks that were working fine in the past (2 months ago was the last time I run them), are now failing with
for code that seems to be very simple: I have moved this code to a standalone app and was not able to repro the issue. Repro steps: git clone https://github.com/dotnet/performance.git
cd performance
py .\scripts\benchmarks_ci.py -f nativeaot7.0 --filter System.Net.Primitives.Tests.IPAddressPerformanceTests.Ctor_Span --bdn-arguments "--keepFiles true" BDN reports the path where to project is stored (the last path in the line below):
cc @jkotas
|
Specifically in this function: |
Yep, looks like intentional: #62392 The perf repo uses a more bleeding edge Roslyn than we do. |
I remember trying to use I wonder if it's going to cause perf regressions as well. |
Mono also does the pattern match optimization for it. I don't know if there are any runtimes that would benefit from this for Maybe a better fix would be for Roslyn not to do this? |
@adamperlin @333fred reverting #62392 will fix this issue and performance penalty. it is a deoptimization. |
cc @VSadov |
/cc @lambdageek @vargaz |
Please revert #62392 ASAP. It breaks the pattern matching done by the JIT for array initialization. It will result in major performance regression on most runtimes and it breaks NativeAOT that does not support non-standard array initialization patterns currently. |
Quick standalone performance test: using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 100_000_000; i++)
CreateInitializedArray();
Console.WriteLine(sw.ElapsedMilliseconds);
[MethodImpl(MethodImplOptions.NoInlining)]
static int[] CreateInitializedArray()
{
return new int[] { 1, 2, 3, 4 };
} Run using:
.NET 7 SDK Preview 6: 390ms As you can see, this change introduced 27x regression of the simple array initialization pattern used in this micro-benchmark. BTW: I do not see any performance numbers in #62392 . In dotnet/runtime, we require performance numbers demonstrating the improvement for all performance optimization PRs. It is not unusual for performance optimizations to not have the intended effect. (EDIT: Corrected the perf numbers.) |
Even with Roslyn reverting the change, we should likely have an issue in the dotnet/runtime repo tracking fixing this on our end. There is no reason that Someone just needs to do the work to ensure the relevant handling exists and to ensure that the JIT can properly optimize the code. The actual method is "large" today and isn't inline friendly, even though the checks being done are effectively all constant. For a non-constant length, this function should really be getting inlined as: |
Yes, we can do all that, but the benefits are vanishingly small.
|
the original api proposal issue is still active. AllocateUninitializedArray is only profitable for larger arrays dotnet/runtime#27146 (comment). we should have an analyzer to clarify it, overall its name is very confusing and it can easily lead developers to think it's pro performance. |
Part of this is the existing overhead. For large allocations, the costs can be fairly visible and there are cases where those are frequent enough or where they are happening in contexts where the costs do matter.
This is no more a trap than
I don't think that a performance oriented API should be a performance trap. If we want people to be aware of the potential pitfalls of it, that's something that I'm all for telling people "this is dangerous, be mindful of the holes". However, I don't think we should be intentionally making it more problematic to use correctly where there is a decently simple fix that can be made. |
Submitted doc update at dotnet/dotnet-api-docs#8249 |
I disagree. We are using it correctly in our own CoreLib in StringBuilder, ArrayPool and number of other places. Number of these places were heavily micro-benchmarked when the uses of this API were introduced to make sure that there is no regression, and none of them have the checks duplicated.
I do not think that the fix is simple. The GC is heavily optimized for giving you zero-initialized memory. If we were to build uninitialized memory path that is as optimized as the existing zero-initialized path, it would be fairly complex change, and it would be hard to make it pay-for-play. |
In dotnet/performance#2534 I've bumped BDN version to see if @MichalStrehovsky fix (dotnet/BenchmarkDotNet#2046) helped. It did, but new issue popped out.
86 out of 4334 benchmarks that were working fine in the past (2 months ago was the last time I run them), are now failing with
PlatformNotSupportedException
:for code that seems to be very simple:
https://github.com/dotnet/performance/blob/062e33e11c917ce408ce4b447552185d6565b035/src/benchmarks/micro/libraries/System.Net.Primitives/IPAddressPerformanceTests.cs#L17-L26
I have moved this code to a standalone app and was not able to repro the issue.
Repro steps:
BDN reports the path where to project is stored (the last path in the line below):
cc @jkotas
The text was updated successfully, but these errors were encountered: