Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Windows/arm64: 16 Regressions on 3/14/2024 6:24:18 PM #100085

Closed
performanceautofiler bot opened this issue Mar 21, 2024 · 6 comments
Closed

[Perf] Windows/arm64: 16 Regressions on 3/14/2024 6:24:18 PM #100085

performanceautofiler bot opened this issue Mar 21, 2024 · 6 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows Priority:2 Work that is important, but not critical for the release runtime-coreclr specific to the CoreCLR runtime
Milestone

Comments

@performanceautofiler
Copy link

performanceautofiler bot commented Mar 21, 2024

Run Information

Name Value
Architecture arm64
OS Windows 10.0.19041
Queue SurfaceWindows
Baseline d36beb7b0a3936d18062ca89572d49590fdd445a
Compare 5c40bb5636b939fb548492fdeb9d501b599ac5f5
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_Version

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
48.24 ns 54.01 ns 1.12 0.02 False
70.58 ns 75.17 ns 1.07 0.01 False

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Version*'

System.Tests.Perf_Version.TryFormatL

ETL Files

Histogram

JIT Disasms

System.Tests.Perf_Version.ToStringL

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS Windows 10.0.19041
Queue SurfaceWindows
Baseline d36beb7b0a3936d18062ca89572d49590fdd445a
Compare 5c40bb5636b939fb548492fdeb9d501b599ac5f5
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Perf_Frozen<ReferenceType>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
5.42 μs 5.86 μs 1.08 0.02 False
38.20 μs 42.58 μs 1.11 0.10 False
4.28 μs 4.64 μs 1.08 0.11 False
47.99 μs 52.64 μs 1.10 0.02 False

graph
graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Perf_Frozen&lt;ReferenceType&gt;*'

System.Collections.Perf_Frozen<ReferenceType>.ToFrozenSet(Count: 64)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_Frozen<ReferenceType>.ToFrozenDictionary(Count: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_Frozen<ReferenceType>.ToFrozenDictionary(Count: 64)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_Frozen<ReferenceType>.ToFrozenSet(Count: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS Windows 10.0.19041
Queue SurfaceWindows
Baseline e658fdf554b9f259b5d1012698aa5387e2ff7c31
Compare 5c40bb5636b939fb548492fdeb9d501b599ac5f5
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.CtorFromCollection<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
36.45 μs 41.86 μs 1.15 0.02 False
25.16 μs 29.91 μs 1.19 0.01 False

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.CtorFromCollection&lt;Int32&gt;*'

System.Collections.CtorFromCollection<Int32>.FrozenSet(Size: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.CtorFromCollection<Int32>.FrozenDictionaryOptimized(Size: 512)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS Windows 10.0.19041
Queue SurfaceWindows
Baseline d36beb7b0a3936d18062ca89572d49590fdd445a
Compare 5c40bb5636b939fb548492fdeb9d501b599ac5f5
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Text.Perf_Ascii

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
9.39 ns 11.05 ns 1.18 0.13 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Text.Perf_Ascii*'

System.Text.Perf_Ascii.EqualsIgnoreCase_DifferentCase_Bytes(Size: 6)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS Windows 10.0.19041
Queue SurfaceWindows
Baseline d36beb7b0a3936d18062ca89572d49590fdd445a
Compare 5c40bb5636b939fb548492fdeb9d501b599ac5f5
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Perf_Frozen<Int16>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
27.33 μs 32.82 μs 1.20 0.01 False
2.88 μs 3.24 μs 1.12 0.03 False

graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Perf_Frozen&lt;Int16&gt;*'

System.Collections.Perf_Frozen<Int16>.ToFrozenDictionary(Count: 512)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_Frozen<Int16>.ToFrozenDictionary(Count: 64)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS Windows 10.0.19041
Queue SurfaceWindows
Baseline e658fdf554b9f259b5d1012698aa5387e2ff7c31
Compare 5c40bb5636b939fb548492fdeb9d501b599ac5f5
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Perf_LengthBucketsFrozenDictionary

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
14.52 μs 16.20 μs 1.12 0.01 False
228.11 μs 246.37 μs 1.08 0.07 False
1.35 μs 1.51 μs 1.11 0.02 False

graph
graph
graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Perf_LengthBucketsFrozenDictionary*'

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 1000, ItemsPerBucket: 5)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 10000, ItemsPerBucket: 5)

ETL Files

Histogram

JIT Disasms

System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 100, ItemsPerBucket: 5)

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS Windows 10.0.19041
Queue SurfaceWindows
Baseline d36beb7b0a3936d18062ca89572d49590fdd445a
Compare 5c40bb5636b939fb548492fdeb9d501b599ac5f5
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in Struct.GSeq

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
15.93 μs 22.29 μs 1.40 0.01 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Struct.GSeq*'

Struct.GSeq.FilterSkipMapSum

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository


Run Information

Name Value
Architecture arm64
OS Windows 10.0.19041
Queue SurfaceWindows
Baseline d36beb7b0a3936d18062ca89572d49590fdd445a
Compare 5c40bb5636b939fb548492fdeb9d501b599ac5f5
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_Enum

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
176.52 ns 188.31 ns 1.07 0.07 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Enum*'

System.Tests.Perf_Enum.Parse_Flags(text: "Red, Orange, Yellow, Green, Blue")

ETL Files

Histogram

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-arm64 os-windows runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Mar 21, 2024
@LoopedBard3 LoopedBard3 removed the untriaged New issue has not been triaged by the area owner label Mar 21, 2024
@LoopedBard3 LoopedBard3 transferred this issue from dotnet/perf-autofiling-issues Mar 21, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 21, 2024
@LoopedBard3
Copy link
Member

Likely: #99634

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 21, 2024
@LoopedBard3
Copy link
Member

LoopedBard3 commented Mar 21, 2024

Other regressions:
Linux arm64: dotnet/perf-autofiling-issues#31603
Windows arm64: dotnet/perf-autofiling-issues#31610, dotnet/perf-autofiling-issues#31996 (Mostly a repeat of this issue)
Windows x64: dotnet/perf-autofiling-issues#31921

@jeffschwMSFT jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 22, 2024
@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Mar 22, 2024
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Mar 22, 2024
@vcsjones vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 25, 2024
@amanasifkhalid amanasifkhalid added the Priority:2 Work that is important, but not critical for the release label May 3, 2024
@amanasifkhalid
Copy link
Member

Notes Recent Score Orig Score Ubuntu 2022.04 arm64 Windows 2010.0.19041 arm64 Windows 2010.0.22621 arm64 Windows 2010.0.18362 x64 Benchmark
1.23 1.23 1.23
1.23
System.Numerics.Tests.Perf_BigInteger.Remainder(arguments: 1024,512 bits)
1.18 1.18 1.22
1.22
1.14
1.14
System.Linq.Tests.Perf_Enumerable.Zip(input: IEnumerable)
1.17 1.17 1.17
1.17
System.IO.Tests.Perf_Path.GetDirectoryName
1.16 1.18 1.13
1.17
1.20
1.20
System.Collections.Perf_Frozen(Int16).ToFrozenDictionary(Count: 512)
1.16 1.13 1.17
1.11
1.15
1.15
System.Collections.CtorFromCollection(Int32).FrozenSet(Size: 512)
1.16 1.54 1.00
1.40
1.40
1.40
1.11
1.85
Struct.GSeq.FilterSkipMapSum
1.16 1.16 1.16
1.16
System.Tests.Perf_Int64.ToString(value: 9223372036854775807)
1.15 1.17 1.12
1.15
1.19
1.19
System.Collections.CtorFromCollection(Int32).FrozenDictionaryOptimized(Size: 512)
1.15 1.15 1.12
1.12
1.12
1.12
1.21
1.21
System.Tests.Perf_Version.TryFormatL
1.14 1.14 1.14
1.14
System.Text.Perf_Ascii.EqualsIgnoreCase_DifferentCase_Bytes_Chars(Size: 6)
1.13 1.13 1.13
1.13
System.Tests.Perf_UInt64.TryFormat(value: 18446744073709551615)
1.12 1.11 1.12
1.11
System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 100, ItemsPerBucket: 5)
1.12 1.12 1.12
1.12
System.Tests.Perf_Int64.TryFormat(value: 9223372036854775807)
1.11 1.11 1.11
1.11
System.Collections.Perf_Frozen(ReferenceType).ToFrozenDictionary(Count: 512)
1.11 1.11 1.12
1.12
1.11
1.11
System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 1000, ItemsPerBucket: 5)
1.11 1.11 1.11
1.11
System.Buffers.Text.Tests.Utf8FormatterTests.FormatterUInt64(value: 18446744073709551615)
1.11 1.09 1.12
1.08
1.10
1.10
System.Collections.Perf_Frozen(ReferenceType).ToFrozenSet(Count: 512)
1.10 1.11 1.08
1.11
1.12
1.12
System.Collections.Perf_Frozen(Int16).ToFrozenDictionary(Count: 64)
1.10 1.09 1.11
1.09
1.09
1.09
System.Collections.Perf_Frozen(NotKnownComparable).ToFrozenDictionary(Count: 512)
1.08 1.08 1.08
1.08
System.Collections.Perf_Frozen(ReferenceType).ToFrozenDictionary(Count: 64)
1.08 1.08 1.08
1.08
System.Collections.Perf_DefaultFrozenDictionary.ToFrozenDictionary(Count: 100)
1.08 1.10 1.08
1.10
System.Buffers.Text.Tests.Utf8FormatterTests.FormatterUInt32(value: 4294967295)
1.08 1.08 1.08
1.09
1.08
1.08
System.Collections.Perf_Frozen(ReferenceType).ToFrozenSet(Count: 64)
1.08 1.08 1.08
1.08
System.Collections.Perf_LengthBucketsFrozenDictionary.TryGetValue_True_FrozenDictionary(Count: 10000, ItemsPerBucket: 5)
1.08 1.06 1.08
1.06
System.Linq.Tests.Perf_Enumerable.WhereSelect(input: IEnumerable)
1.07 1.07 1.07
1.07
System.Tests.Perf_Uri.GetComponents
1.07 1.09 1.07
1.11
1.07
1.07
System.Tests.Perf_Version.ToStringL
1.07 1.08 1.06
1.09
1.08
1.08
System.Collections.Perf_Frozen(NotKnownComparable).ToFrozenDictionary(Count: 64)
1.07 1.07 1.07
1.07
System.Tests.Perf_Enum.Parse_Flags(text: "Red, Orange, Yellow, Green, Blue")
1.04 1.10 1.04
1.10
System.Linq.Tests.Perf_Enumerable.WhereSelect(input: List)
1.02 1.16 0.89
1.14
1.18
1.18
System.Text.Perf_Ascii.EqualsIgnoreCase_DifferentCase_Bytes(Size: 6)
1.00 1.13 1.00
1.13
System.Text.Perf_Ascii.EqualsIgnoreCase_DifferentCase_Chars(Size: 6)
0.81 1.07 0.81
1.07
System.Collections.Sort(IntStruct).Array_ComparerClass(Size: 512)
0.78 1.11 0.78
1.11
System.Collections.Sort(IntStruct).Array_Comparison(Size: 512)
0.25 1.07 0.25
1.07
System.Buffers.Tests.SearchValuesCharTests.LastIndexOfAny(Values: "ßäöüÄÖÜ")

@amanasifkhalid
Copy link
Member

I've looked at the Linux arm64 benchmarks with recent regressions >=10% since the data is readily available, and the original regressions seem to have been fixed; instead, these new regressions are from block layout and/or block compaction. For example:

image
image
image
image
(Not sure why this one is so noisy after the block layout changes)
image

I'll need to manually query the Windows data to get newer results, but I suspect we'll see a similar theme.

@amanasifkhalid
Copy link
Member

Windows x64 (purple is 10, blue is 11)

System.Numerics.Tests.Perf_BigInteger.Remainder(arguments: 1024,512 bits)
image

System.Linq.Tests.Perf_Enumerable.Zip(input: IEnumerable)
image

System.IO.Tests.Perf_Path.GetDirectoryName
image

System.Tests.Perf_Int64.ToString(value: 9223372036854775807)
image

System.Tests.Perf_Version.TryFormatL
image

Windows arm64 (turquoise is 10, blue is 11; baseline perf looks worse on 11 for some of these):

System.Collections.Perf_Frozen<Int16>.ToFrozenDictionary(Count: 512)
image

System.Collections.CtorFromCollection<Int32>.FrozenSet(Size: 512)
image

Struct.GSeq.FilterSkipMapSum
image

System.Collections.CtorFromCollection<Int32>.FrozenDictionaryOptimized(Size: 512)
image

System.Tests.Perf_Version.TryFormatL
image

@amanasifkhalid
Copy link
Member

I'm going to close this issue in favor of #102763 and #103972. The above regressions seem best addressed by iterating on the new block layout.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows Priority:2 Work that is important, but not critical for the release runtime-coreclr specific to the CoreCLR runtime
Projects
None yet
Development

No branches or pull requests

5 participants