Major performance regression of the array initialization pattern #62864

adamsitnik · 2022-07-22T08:29:59Z

In dotnet/performance#2534 I've bumped BDN version to see if @MichalStrehovsky fix (dotnet/BenchmarkDotNet#2046) helped. It did, but new issue popped out.

86 out of 4334 benchmarks that were working fine in the past (2 months ago was the last time I run them), are now failing with PlatformNotSupportedException:

 System.PlatformNotSupportedException: Operation is not supported on this platform.
    at Internal.Runtime.CompilerHelpers.ThrowHelpers.ThrowPlatformNotSupportedException() + 0x24
    at System.Net.Primitives.Tests.IPAddressPerformanceTests.<ByteAddresses>d__0.MoveNext() + 0x45
    at System.Linq.Enumerable.<OfTypeIterator>d__65`1.MoveNext() + 0x6b
    at System.Linq.Enumerable.MaxInteger[TSource, TResult](IEnumerable`1, Func`2) + 0x48
    at System.Net.Primitives.Tests.IPAddressPerformanceTests..ctor() + 0x9d
    at BenchmarkDotNet.Autogenerated.Runnable_1..ctor() + 0x2a
    at BenchmarkDotNet.Autogenerated.Runnable_1.Run(IHost, String) + 0x6e
    at BenchmarkDotNet.Autogenerated.UniqueProgramName.AfterAssemblyLoadingAttached(String[]) + 0x1a2

for code that seems to be very simple:

https://github.com/dotnet/performance/blob/062e33e11c917ce408ce4b447552185d6565b035/src/benchmarks/micro/libraries/System.Net.Primitives/IPAddressPerformanceTests.cs#L17-L26

I have moved this code to a standalone app and was not able to repro the issue.

Repro steps:

git clone https://github.com/dotnet/performance.git
cd performance
py .\scripts\benchmarks_ci.py -f nativeaot7.0 --filter System.Net.Primitives.Tests.IPAddressPerformanceTests.Ctor_Span --bdn-arguments "--keepFiles true"

BDN reports the path where to project is stored (the last path in the line below):

// start dotnet  build -c Release -r win-x64 --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true /p:NuGetPackageRoot="D:\projects\performance\artifacts\packages" in D:\projects\performance\artifacts\bin\MicroBenchmarks\Release\net7.0\Job-NYBBDR

cc @jkotas

The text was updated successfully, but these errors were encountered:

MichalStrehovsky · 2022-07-22T08:59:51Z

The IL for the method in question looks like this - there's a GC::AllocateUninitializedArray instead of newarr. This is breaking pattern recognition in RyuJIT. In NativeAOT this is a PlatformNotSupported because RuntimeHelpers::InitializeArray needs to be expanded as an intrinsic. In CoreCLR, breaking the pattern recognition is a deoptimization. Good thing we found it.

Cc @jaredpar to make sure this is intentional new codegen for this.

RyuJIT will need to be adapted to recognize the new pattern.

.method private final hidebysig newslot virtual 
	instance bool MoveNext () cil managed 
{
	.override method instance bool [System.Runtime]System.Collections.IEnumerator::MoveNext()
	// Method begins at RVA 0x701e8
	// Header size: 12
	// Code size: 117 (0x75)
	.maxstack 4
	.locals init (
		[0] int32
	)

	IL_0000: ldarg.0
	IL_0001: ldfld int32 System.Net.Primitives.Tests.IPAddressPerformanceTests/'<ByteAddresses>d__0'::'<>1__state'
	IL_0006: stloc.0
	IL_0007: ldloc.0
	IL_0008: switch (IL_001b, IL_0043, IL_006c)

	IL_0019: ldc.i4.0
	IL_001a: ret

	IL_001b: ldarg.0
	IL_001c: ldc.i4.m1
	IL_001d: stfld int32 System.Net.Primitives.Tests.IPAddressPerformanceTests/'<ByteAddresses>d__0'::'<>1__state'
	IL_0022: ldarg.0
	IL_0023: ldc.i4.4
	IL_0024: ldc.i4.0
	IL_0025: call !!0[] [System.Runtime]System.GC::AllocateUninitializedArray<uint8>(int32, bool)
	IL_002a: dup
	IL_002b: ldtoken field int32 '<PrivateImplementationDetails>'::A58DB492CDCA7F4880696F0FD6743AE3D8F89D85DF07A95D10C4DD82D3C5881A
	IL_0030: call void [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [System.Runtime]System.Array, valuetype [System.Runtime]System.RuntimeFieldHandle)
	IL_0035: stfld object System.Net.Primitives.Tests.IPAddressPerformanceTests/'<ByteAddresses>d__0'::'<>2__current'
	IL_003a: ldarg.0
	IL_003b: ldc.i4.1
	IL_003c: stfld int32 System.Net.Primitives.Tests.IPAddressPerformanceTests/'<ByteAddresses>d__0'::'<>1__state'
	IL_0041: ldc.i4.1
	IL_0042: ret

	IL_0043: ldarg.0
	IL_0044: ldc.i4.m1
	IL_0045: stfld int32 System.Net.Primitives.Tests.IPAddressPerformanceTests/'<ByteAddresses>d__0'::'<>1__state'
	IL_004a: ldarg.0
	IL_004b: ldc.i4.s 16
	IL_004d: ldc.i4.0
	IL_004e: call !!0[] [System.Runtime]System.GC::AllocateUninitializedArray<uint8>(int32, bool)
	IL_0053: dup
	IL_0054: ldtoken field valuetype '<PrivateImplementationDetails>'/'__StaticArrayInitTypeSize=16' '<PrivateImplementationDetails>'::E531DA00E6A1B6D170859CF4F00DF857B1338B4950DB57E2B81B6D7A2ACD8981
	IL_0059: call void [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [System.Runtime]System.Array, valuetype [System.Runtime]System.RuntimeFieldHandle)
	IL_005e: stfld object System.Net.Primitives.Tests.IPAddressPerformanceTests/'<ByteAddresses>d__0'::'<>2__current'
	IL_0063: ldarg.0
	IL_0064: ldc.i4.2
	IL_0065: stfld int32 System.Net.Primitives.Tests.IPAddressPerformanceTests/'<ByteAddresses>d__0'::'<>1__state'
	IL_006a: ldc.i4.1
	IL_006b: ret

	IL_006c: ldarg.0
	IL_006d: ldc.i4.m1
	IL_006e: stfld int32 System.Net.Primitives.Tests.IPAddressPerformanceTests/'<ByteAddresses>d__0'::'<>1__state'
	IL_0073: ldc.i4.0
	IL_0074: ret
} // end of method '<ByteAddresses>d__0'::MoveNext

ghost · 2022-07-22T09:00:14Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

In dotnet/performance#2534 I've bumped BDN version to see if @MichalStrehovsky fix (dotnet/BenchmarkDotNet#2046) helped. It did, but new issue popped out.

86 out of 4334 benchmarks that were working fine in the past (2 months ago was the last time I run them), are now failing with PlatformNotSupportedException:

 System.PlatformNotSupportedException: Operation is not supported on this platform.
    at Internal.Runtime.CompilerHelpers.ThrowHelpers.ThrowPlatformNotSupportedException() + 0x24
    at System.Net.Primitives.Tests.IPAddressPerformanceTests.<ByteAddresses>d__0.MoveNext() + 0x45
    at System.Linq.Enumerable.<OfTypeIterator>d__65`1.MoveNext() + 0x6b
    at System.Linq.Enumerable.MaxInteger[TSource, TResult](IEnumerable`1, Func`2) + 0x48
    at System.Net.Primitives.Tests.IPAddressPerformanceTests..ctor() + 0x9d
    at BenchmarkDotNet.Autogenerated.Runnable_1..ctor() + 0x2a
    at BenchmarkDotNet.Autogenerated.Runnable_1.Run(IHost, String) + 0x6e
    at BenchmarkDotNet.Autogenerated.UniqueProgramName.AfterAssemblyLoadingAttached(String[]) + 0x1a2

for code that seems to be very simple:

https://github.com/dotnet/performance/blob/062e33e11c917ce408ce4b447552185d6565b035/src/benchmarks/micro/libraries/System.Net.Primitives/IPAddressPerformanceTests.cs#L17-L26

I have moved this code to a standalone app and was not able to repro the issue.

Repro steps:

git clone https://github.com/dotnet/performance.git
cd performance
py .\scripts\benchmarks_ci.py -f nativeaot7.0 --filter System.Net.Primitives.Tests.IPAddressPerformanceTests.Ctor_Span --bdn-arguments "--keepFiles true"

BDN reports the path where to project is stored (the last path in the line below):

// start dotnet  build -c Release -r win-x64 --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true /p:NuGetPackageRoot="D:\projects\performance\artifacts\packages" in D:\projects\performance\artifacts\bin\MicroBenchmarks\Release\net7.0\Job-NYBBDR

cc @jkotas

Author:	adamsitnik
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `untriaged`
Milestone:	-

jakobbotsch · 2022-07-22T09:06:16Z

This is breaking pattern recognition in RyuJIT.

Specifically in this function:
https://github.com/dotnet/runtime/blob/a5e678dd58c365764d0889e137f5953533dbb451/src/coreclr/jit/importer.cpp#L3123-L3133

MichalStrehovsky · 2022-07-22T09:06:32Z

Yep, looks like intentional: #62392

The perf repo uses a more bleeding edge Roslyn than we do.

adamsitnik · 2022-07-22T09:14:24Z

there's a GC::AllocateUninitializedArray instead of newarr.

I remember trying to use GC::AllocateUninitializedArray in a few places and realizing that it was often slower than newarr. It even has a fallback depending on size:

https://github.com/dotnet/runtime/blob/5391db5fc3a0282ae30c40929895fa61a4a70976/src/coreclr/System.Private.CoreLib/src/System/GC.CoreCLR.cs#L683-L687

I wonder if it's going to cause perf regressions as well.

MichalStrehovsky · 2022-07-22T09:28:31Z

Mono also does the pattern match optimization for it. I don't know if there are any runtimes that would benefit from this for InitializeArray since they just rewrite the whole sequence into a more efficient thing anyway.

Maybe a better fix would be for Roslyn not to do this?

kasperk81 · 2022-07-22T09:34:25Z

@adamperlin @333fred reverting #62392 will fix this issue and performance penalty. it is a deoptimization. AllocateUninitializedArray has worse codegen compared to new[] dotnet/runtime#65999 (comment).

adamsitnik · 2022-07-22T12:38:27Z

cc @VSadov

marek-safar · 2022-07-22T12:44:20Z

/cc @lambdageek @vargaz

jkotas · 2022-07-22T13:39:13Z

Please revert #62392 ASAP.

It breaks the pattern matching done by the JIT for array initialization. It will result in major performance regression on most runtimes and it breaks NativeAOT that does not support non-standard array initialization patterns currently.

jkotas · 2022-07-22T13:49:08Z

Quick standalone performance test:

using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;

Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 100_000_000; i++)
    CreateInitializedArray();
Console.WriteLine(sw.ElapsedMilliseconds);

[MethodImpl(MethodImplOptions.NoInlining)]
static int[] CreateInitializedArray()
{
    return new int[] { 1, 2, 3, 4 };
}

Run using:

set DOTNET_TieredCompilation=0
dotnet run -c Release`

.NET 7 SDK Preview 6: 390ms
Bleeding edge Roslyn (4.4.0-1.22371.1): 10899ms

As you can see, this change introduced 27x regression of the simple array initialization pattern used in this micro-benchmark.

BTW: I do not see any performance numbers in #62392 . In dotnet/runtime, we require performance numbers demonstrating the improvement for all performance optimization PRs. It is not unusual for performance optimizations to not have the intended effect.

(EDIT: Corrected the perf numbers.)

tannergooding · 2022-07-22T14:43:29Z

As you can see, this change introduced 27x regression of the simple array initialization pattern used in this micro-benchmark.

Even with Roslyn reverting the change, we should likely have an issue in the dotnet/runtime repo tracking fixing this on our end. There is no reason that GC.AllocateUninitializedArray for constant inputs can't be replaced with a call to newarr to ensure there is no regression for "small inputs". That would remove an existing "pit of failure" from an API that looks performance oriented. However, additionally, there is really no reason why we couldn't also optimize GC.AllocateUninitializedArray for non-constant length but constant pinned = false to also have the correct fast paths so that there is no real perf difference.

Someone just needs to do the work to ensure the relevant handling exists and to ensure that the JIT can properly optimize the code. The actual method is "large" today and isn't inline friendly, even though the checks being done are effectively all constant. For a non-constant length, this function should really be getting inlined as: if (length < 2048 / sizeof(T)) { return new T[length]; } else { return AllocateNewArray(...); } -or- even better, have a CORINFO_HELP_NEWARR_* API to ensure that this is really just newarr with a "should I zero or not" flags check (particularly since the "failure" cases are really just dependent on whether T is a reference type or contains references, which is always known at JIT time).

jkotas · 2022-07-22T14:50:46Z

Yes, we can do all that, but the benefits are vanishingly small.

GC.AllocateUninitializedArray is also a security trap due to uninitialized data. I kind of like that the API is a performance trap since it forces people to use it only in places where it is actually needed.

kasperk81 · 2022-07-22T14:52:23Z

the original api proposal issue is still active. AllocateUninitializedArray is only profitable for larger arrays dotnet/runtime#27146 (comment). we should have an analyzer to clarify it, overall its name is very confusing and it can easily lead developers to think it's pro performance.

tannergooding · 2022-07-22T15:28:07Z

Yes, we can do all that, but the benefits are vanishingly small.

Part of this is the existing overhead. For large allocations, the costs can be fairly visible and there are cases where those are frequent enough or where they are happening in contexts where the costs do matter.

GC.AllocateUninitializedArray is also a security trap due to uninitialized data.

This is no more a trap than NativeMemory.Alloc and while NativeMemory.Alloc is often a suitable choice, there are many scenarios where it is not.

I kind of like that the API is a performance trap since it forces people to use it only in places where it is actually needed.

I don't think that a performance oriented API should be a performance trap. If we want people to be aware of the potential pitfalls of it, that's something that [UnsafeCallersOnly] or similar should be for. As the API stands today, it has paths in it to handle the small case but to use it "correctly" (even in cases where it is actually needed), the devs realistically need to duplicate those same checks.

I'm all for telling people "this is dangerous, be mindful of the holes". However, I don't think we should be intentionally making it more problematic to use correctly where there is a decently simple fix that can be made.

jkotas · 2022-07-22T15:30:19Z

Submitted doc update at dotnet/dotnet-api-docs#8249

jkotas · 2022-07-22T15:38:49Z

use it "correctly" (even in cases where it is actually needed), the devs realistically need to duplicate those same checks.

I disagree. We are using it correctly in our own CoreLib in StringBuilder, ArrayPool and number of other places. Number of these places were heavily micro-benchmarked when the uses of this API were introduced to make sure that there is no regression, and none of them have the checks duplicated.

there is a decently simple fix that can be made.

I do not think that the fix is simple. The GC is heavily optimized for giving you zero-initialized memory. If we were to build uninitialized memory path that is as optimized as the existing zero-initialized path, it would be fairly complex change, and it would be hard to make it pay-for-play.

ghost added the untriaged Issues and PRs which have not yet been triaged by a lead label Jul 22, 2022

adamsitnik mentioned this issue Jul 22, 2022

Performance-ci centos tests failing in BDN dotnet/performance#2496

Closed

MichalStrehovsky added the area-CodeGen-coreclr label Jul 22, 2022

ghost removed the untriaged Issues and PRs which have not yet been triaged by a lead label Jul 22, 2022

dotnet-issue-labeler bot added the untriaged Issues and PRs which have not yet been triaged by a lead label Jul 22, 2022

jkotas transferred this issue from dotnet/runtime Jul 22, 2022

jkotas removed untriaged Issues and PRs which have not yet been triaged by a lead area-CodeGen-coreclr labels Jul 22, 2022

jkotas changed the title ~~[NativeAOT] Unexpected PlatformNotSupportedException thrown~~ Major performance regression of the array initialization pattern Jul 22, 2022

This was referenced Jul 22, 2022

Revert "Add array initialization optimization" #62868

Merged

Emit GC.AllocateUninitializedArray for an initialized array creation expression #61011

Closed

jkotas mentioned this issue Jul 22, 2022

Document AllocateUninitializedArray security and performance traps dotnet/dotnet-api-docs#8249

Merged

333fred closed this as completed in #62868 Jul 23, 2022

adamsitnik mentioned this issue Jul 25, 2022

Update the BDN version to use 0.13.1.1819. dotnet/performance#2532

Merged

jkotas mentioned this issue Jul 25, 2022

NativeAOT: type initialization crashing at runtime dotnet/runtime#72782

Closed

jakobbotsch mentioned this issue Aug 11, 2022

Update runtime repo to use Preview 7 SDK dotnet/runtime#72145

Merged

teo-tsirpanis mentioned this issue Aug 20, 2022

[NativeAOT] PlatformNotSupportedException when creating an array dotnet/runtime#74299

Closed

jjonescz mentioned this issue Jul 8, 2024

CodeGen for defaulted collection expression elements (Span via InlineArray) #74184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major performance regression of the array initialization pattern #62864

Major performance regression of the array initialization pattern #62864

adamsitnik commented Jul 22, 2022

MichalStrehovsky commented Jul 22, 2022

ghost commented Jul 22, 2022

jakobbotsch commented Jul 22, 2022

MichalStrehovsky commented Jul 22, 2022

adamsitnik commented Jul 22, 2022

MichalStrehovsky commented Jul 22, 2022

kasperk81 commented Jul 22, 2022

adamsitnik commented Jul 22, 2022

marek-safar commented Jul 22, 2022

jkotas commented Jul 22, 2022 •

edited

Loading

jkotas commented Jul 22, 2022 •

edited

Loading

tannergooding commented Jul 22, 2022

jkotas commented Jul 22, 2022

kasperk81 commented Jul 22, 2022 •

edited

Loading

tannergooding commented Jul 22, 2022

jkotas commented Jul 22, 2022

jkotas commented Jul 22, 2022 •

edited

Loading

Major performance regression of the array initialization pattern #62864

Major performance regression of the array initialization pattern #62864

Comments

adamsitnik commented Jul 22, 2022

MichalStrehovsky commented Jul 22, 2022

ghost commented Jul 22, 2022

jakobbotsch commented Jul 22, 2022

MichalStrehovsky commented Jul 22, 2022

adamsitnik commented Jul 22, 2022

MichalStrehovsky commented Jul 22, 2022

kasperk81 commented Jul 22, 2022

adamsitnik commented Jul 22, 2022

marek-safar commented Jul 22, 2022

jkotas commented Jul 22, 2022 • edited Loading

jkotas commented Jul 22, 2022 • edited Loading

tannergooding commented Jul 22, 2022

jkotas commented Jul 22, 2022

kasperk81 commented Jul 22, 2022 • edited Loading

tannergooding commented Jul 22, 2022

jkotas commented Jul 22, 2022

jkotas commented Jul 22, 2022 • edited Loading

jkotas commented Jul 22, 2022 •

edited

Loading

jkotas commented Jul 22, 2022 •

edited

Loading

kasperk81 commented Jul 22, 2022 •

edited

Loading

jkotas commented Jul 22, 2022 •

edited

Loading