Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeGen for defaulted collection expression elements (Span via InlineArray) #74184

Closed
bernd5 opened this issue Jun 27, 2024 · 15 comments
Closed
Labels
Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead

Comments

@bernd5
Copy link
Contributor

bernd5 commented Jun 27, 2024

Version Used:

Steps to Reproduce:

Compile the following code:

using System;

Span<int> i = [0, 1, 0];

Expected Behavior:
I would assume the assignment of elements with a default value could be omitted.

using System;
using System.Diagnostics;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Security;
using System.Security.Permissions;

[assembly: CompilationRelaxations(8)]
[assembly: RuntimeCompatibility(WrapNonExceptionThrows = true)]
[assembly: Debuggable(DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints)]
[assembly: SecurityPermission(SecurityAction.RequestMinimum, SkipVerification = true)]
[assembly: AssemblyVersion("0.0.0.0")]
[module: UnverifiableCode]
[module: RefSafetyRules(11)]

[CompilerGenerated]
internal class Program
{
    private static void <Main>$(string[] args)
    {
        <>y__InlineArray3<int> buffer = default(<>y__InlineArray3<int>);
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray3<int>, int>(ref buffer, 1) = 1;
        <PrivateImplementationDetails>.InlineArrayAsSpan<<>y__InlineArray3<int>, int>(ref buffer, 3);
    }
}

[CompilerGenerated]
internal sealed class <PrivateImplementationDetails>
{
    internal static Span<TElement> InlineArrayAsSpan<TBuffer, TElement>(ref TBuffer buffer, int length)
    {
        return MemoryMarshal.CreateSpan(ref Unsafe.As<TBuffer, TElement>(ref buffer), length);
    }

    internal static ref TElement InlineArrayElementRef<TBuffer, TElement>(ref TBuffer buffer, int index)
    {
        return ref Unsafe.Add(ref Unsafe.As<TBuffer, TElement>(ref buffer), index);
    }
}

[StructLayout(LayoutKind.Auto)]
[InlineArray(3)]
internal struct <>y__InlineArray3<T>
{
    [CompilerGenerated]
    private T _element0;
}

Actual Behavior:

using System;
using System.Diagnostics;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Security;
using System.Security.Permissions;

[assembly: CompilationRelaxations(8)]
[assembly: RuntimeCompatibility(WrapNonExceptionThrows = true)]
[assembly: Debuggable(DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints)]
[assembly: SecurityPermission(SecurityAction.RequestMinimum, SkipVerification = true)]
[assembly: AssemblyVersion("0.0.0.0")]
[module: UnverifiableCode]
[module: RefSafetyRules(11)]

[CompilerGenerated]
internal class Program
{
    private static void <Main>$(string[] args)
    {
        <>y__InlineArray3<int> buffer = default(<>y__InlineArray3<int>);
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray3<int>, int>(ref buffer, 0) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray3<int>, int>(ref buffer, 1) = 1;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray3<int>, int>(ref buffer, 2) = 0;
        <PrivateImplementationDetails>.InlineArrayAsSpan<<>y__InlineArray3<int>, int>(ref buffer, 3);
    }
}

[CompilerGenerated]
internal sealed class <PrivateImplementationDetails>
{
    internal static Span<TElement> InlineArrayAsSpan<TBuffer, TElement>(ref TBuffer buffer, int length)
    {
        return MemoryMarshal.CreateSpan(ref Unsafe.As<TBuffer, TElement>(ref buffer), length);
    }

    internal static ref TElement InlineArrayElementRef<TBuffer, TElement>(ref TBuffer buffer, int index)
    {
        return ref Unsafe.Add(ref Unsafe.As<TBuffer, TElement>(ref buffer), index);
    }
}

[StructLayout(LayoutKind.Auto)]
[InlineArray(3)]
internal struct <>y__InlineArray3<T>
{
    [CompilerGenerated]
    private T _element0;
}
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Jun 27, 2024
@jaredpar
Copy link
Member

I would assume the assignment of elements with a default value could be omitted.

Why would you assume that? The user has explicitly asked the compiler to assign a value here, why should the compiler omit that?

Consider a case where the user is say getting an array back from a pool and wants ensure parts of it are initialized. In that case the default value certainly cannot be omitted.

@bernd5
Copy link
Contributor Author

bernd5 commented Jun 28, 2024

In general you are absolutely right. I mean the spefific Span lowering via inline array. Inline Array structs are already default initialized after construction. So these assignments are noops which can be removed / optimized away.

@bernd5 bernd5 changed the title CodeGen for defaulted collection expression elements CodeGen for defaulted collection expression elements (Span via InlineArray) Jun 28, 2024
@jaredpar
Copy link
Member

What if the user has disabled locals init behavior and are using the defaults to force initialization?

@bernd5
Copy link
Contributor Author

bernd5 commented Jun 28, 2024

Right, then it can't be optimized out.
The result should be the same.

@jcouv jcouv added the Resolution-By Design The behavior reported in the issue matches the current design label Jul 2, 2024
@bernd5
Copy link
Contributor Author

bernd5 commented Jul 2, 2024

Right, then it can't be optimized out. The result should be the same.

@jaredpar could you explain the situation when it can't be optimized out? The inline array is constructed by the compiler - not by the user, and the compiler emits the initobj instruction which default initializes the struct.

In my eyes the assignment of constant default values can be omitted always for inline array to span construction.

This code can be

using System;

foreach (var item in (Span<int>)[0, 1, 0]){
  Console.WriteLine(item);
}

lowered to:

using System;
using System.Diagnostics;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Security;
using System.Security.Permissions;

internal class Program
{
    [SkipLocalsInit]
    private static void Main(string[] args)
    {
        y__InlineArray3<int> buffer = default(y__InlineArray3<int>);
        //not needed
        //PrivateImplementationDetails.InlineArrayElementRef<y__InlineArray3<int>, int>(ref buffer, 0) = 0;
        PrivateImplementationDetails.InlineArrayElementRef<y__InlineArray3<int>, int>(ref buffer, 1) = 1;
        //not needed
        //PrivateImplementationDetails.InlineArrayElementRef<y__InlineArray3<int>, int>(ref buffer, 2) = 0;
        Span<int> s = PrivateImplementationDetails.InlineArrayAsSpan<y__InlineArray3<int>, int>(ref buffer, 3);
        foreach (var item in s){
            Console.WriteLine(item);
        }
    }
}

[CompilerGenerated]
internal sealed class PrivateImplementationDetails
{
    internal static Span<TElement> InlineArrayAsSpan<TBuffer, TElement>(ref TBuffer buffer, int length)
    {
        return MemoryMarshal.CreateSpan(ref Unsafe.As<TBuffer, TElement>(ref buffer), length);
    }

    internal static ref TElement InlineArrayElementRef<TBuffer, TElement>(ref TBuffer buffer, int index)
    {
        return ref Unsafe.Add(ref Unsafe.As<TBuffer, TElement>(ref buffer), index);
    }
}

[StructLayout(LayoutKind.Auto)]
[InlineArray(3)]
internal struct y__InlineArray3<T>
{
    [CompilerGenerated]
    private T _element0;
}

@jaredpar
Copy link
Member

jaredpar commented Jul 3, 2024

could you explain the situation when it can't be optimized out?

I've listed a few already in the issue. Which one is unclear?

In my eyes the assignment of constant default values can be omitted always for inline array to span construction.

Optimizations need to be treated like features. The burden is to demonstrate that it is safe, not intuit that it's safe. For example explicitly listing the scenarios in which the optimization will occur and why specifically it's legal in that scenario.

For example in the Span scenario you have to demonstrate that the memory which is not written will be guaranteed to have the default value already.

Also: optimizations are not free. They are a significant source of regressions. That means there is also the burden of demonstrating this optimization is valuable / worth the risk trade off. I've seen no evidence of that in this issue.

@bernd5
Copy link
Contributor Author

bernd5 commented Jul 3, 2024

What if the user has disabled locals init behavior and are using the defaults to force initialization?

That doesn't matter because the inline-array-struct is explicitly defaulted. If it wouldn't I agree.

Why would you assume that? The user has explicitly asked the compiler to assign a value here, why should the compiler omit that?
Consider a case where the user is say getting an array back from a pool and wants ensure parts of it are initialized. In that case the default value certainly cannot be omitted.

That is a very different case. I talk only about span-collection-expression lowering via inline-array, nothing else.

I added a draft-PR already which seems to fail only for some code gen builder tests, where a temporary inline array is created for the collection builder.

For heap / regular arrays this is already implemented, see: EmitArrayInitializer. I think we should do that for inline array, too. Especially because they are used only because of performance.

@jaredpar
Copy link
Member

jaredpar commented Jul 3, 2024

I feel like you haven't fully responded to my last comment

  1. What are the very specific cases this would apply to and why is it demonstrably safe in all those cases?
  2. Why should we do this optimization? Every change, particularly optimizations, carry risk. For optimizations we need real data to motivate us to take these changes. I'm very skeptical on this particular point.

Lacking these two items this issue and accompanying PRs cannot move forward.

@bernd5
Copy link
Contributor Author

bernd5 commented Jul 3, 2024

What are the very specific cases this would apply to and why is it demonstrably safe in all those cases?

I mean this use case:

using System;
using System.IO;

Stream someStream = null;
Span<byte> smallBuffer = [0,0,0,0,0,0,0,0,0,0];
someStream.Read(smallBuffer);

As a user I expect that an inline array is constructed without any assigments. But as you see, today the compiler emits:

using System;
using System.Diagnostics;
using System.IO;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Security;
using System.Security.Permissions;

[assembly: CompilationRelaxations(8)]
[assembly: RuntimeCompatibility(WrapNonExceptionThrows = true)]
[assembly: Debuggable(DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints)]
[assembly: SecurityPermission(SecurityAction.RequestMinimum, SkipVerification = true)]
[assembly: AssemblyVersion("0.0.0.0")]
[module: UnverifiableCode]
[module: RefSafetyRules(11)]

[CompilerGenerated]
internal class Program
{
    private static void <Main>$(string[] args)
    {
        <>y__InlineArray9<byte> buffer = default(<>y__InlineArray9<byte>);
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 0) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 1) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 2) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 3) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 4) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 5) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 6) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 7) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 8) = 0;
        Span<byte> buffer2 = <PrivateImplementationDetails>.InlineArrayAsSpan<<>y__InlineArray9<byte>, byte>(ref buffer, 9);
        ((Stream)null).Read(buffer2);
    }
}

[CompilerGenerated]
internal sealed class <PrivateImplementationDetails>
{
    internal static Span<TElement> InlineArrayAsSpan<TBuffer, TElement>(ref TBuffer buffer, int length)
    {
        return MemoryMarshal.CreateSpan(ref Unsafe.As<TBuffer, TElement>(ref buffer), length);
    }

    internal static ref TElement InlineArrayElementRef<TBuffer, TElement>(ref TBuffer buffer, int index)
    {
        return ref Unsafe.Add(ref Unsafe.As<TBuffer, TElement>(ref buffer), index);
    }
}

[StructLayout(LayoutKind.Auto)]
[InlineArray(9)]
internal struct <>y__InlineArray9<T>
{
    [CompilerGenerated]
    private T _element0;
}

And I want:

using System;
using System.Diagnostics;
using System.IO;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Security;
using System.Security.Permissions;

[assembly: CompilationRelaxations(8)]
[assembly: RuntimeCompatibility(WrapNonExceptionThrows = true)]
[assembly: Debuggable(DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints)]
[assembly: SecurityPermission(SecurityAction.RequestMinimum, SkipVerification = true)]
[assembly: AssemblyVersion("0.0.0.0")]
[module: UnverifiableCode]
[module: RefSafetyRules(11)]

[CompilerGenerated]
internal class Program
{
    private static void <Main>$(string[] args)
    {
        <>y__InlineArray9<byte> buffer = default(<>y__InlineArray9<byte>);
        Span<byte> buffer2 = <PrivateImplementationDetails>.InlineArrayAsSpan<<>y__InlineArray9<byte>, byte>(ref buffer, 9);
        ((Stream)null).Read(buffer2);
    }
}

[CompilerGenerated]
internal sealed class <PrivateImplementationDetails>
{
    internal static Span<TElement> InlineArrayAsSpan<TBuffer, TElement>(ref TBuffer buffer, int length)
    {
        return MemoryMarshal.CreateSpan(ref Unsafe.As<TBuffer, TElement>(ref buffer), length);
    }

    internal static ref TElement InlineArrayElementRef<TBuffer, TElement>(ref TBuffer buffer, int index)
    {
        return ref Unsafe.Add(ref Unsafe.As<TBuffer, TElement>(ref buffer), index);
    }
}

[StructLayout(LayoutKind.Auto)]
[InlineArray(9)]
internal struct <>y__InlineArray9<T>
{
    [CompilerGenerated]
    private T _element0;
}

Why should we do this optimization? Every change, particularly optimizations, carry risk. For optimizations we need real data to motivate us to take these changes. I'm very skeptical on this particular point.

In my eyes it is obvious that those assignments are unnecessary. The inline array is already defaulted (though in this case I don't care about initialization at all).

@bernd5
Copy link
Contributor Author

bernd5 commented Jul 3, 2024

why is it demonstrably safe in all those cases

Because we do:

default(<>y__InlineArray9<byte>)

@bernd5
Copy link
Contributor Author

bernd5 commented Jul 3, 2024

If I would use stackalloc the generated code looks much better, but it coulde issues with inlining or in loops...

using System;
using System.IO;

Stream someStream = null;
Span<byte> smallBuffer = stackalloc byte[9];
someStream.Read(smallBuffer);

using an inline array under the hood seems to be the better solution because it is safe.

@jaredpar
Copy link
Member

jaredpar commented Jul 8, 2024

In my eyes it is obvious that those assignments are unnecessary

That is not real data. Data is profiles showing that the optimization actually improves performance. Depending on intuition like this can, and does, lead to real world regressions. Further it really helps to see data around impact. For example how often does this pattern happen in the real world? An optimization might be demonstrably better but if no real world code hits it then what is the point in doing it?

I mean this use case:

There needs to be more specifics than a single case. It needs to be much more detailed about when an optimization happens. For example this optimization cannot happen when init locals is disable.

Further it needs to touch on what other implications this has. Consider for example #73269 where we want to be smarter about temp re-use in the compiler (leads to stack size savings in real world scenarios). This optimization directly conflicts with that one because this optimization is brittle in the face of temporary re-use as it would allow uninitialized data to leak through. Items like that need to be discussed.

@bernd5
Copy link
Contributor Author

bernd5 commented Jul 8, 2024

I actually did a benchmark for this specific use case and observed that such unnecessary assignments to defaulted struct-fields are optimized away by the runtime (for Release builds) as long as the struct construction is in the same method / stack-frame (without inlining - which is the case here).

See for example:

using System;

var test = new Test();
test.A = null;
test.B = default;
test.B = default;
Console.WriteLine(test.A);
Console.WriteLine(test.B);

struct Test{
  public string A;   
  public int B;
  public int C;
}

This is compiled to:

Program..ctor()
    L0000: ret

Program.<Main>$(System.String[])
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: xor ecx, ecx
    L0005: call dword ptr [0x13756f58]
    L000b: xor ecx, ecx
    L000d: call dword ptr [0x13756ee0]
    L0013: pop ebp
    L0014: ret

Which is quite perfect!

So the real benefit woule be (only) less work for the runtime optimization and binary-size reduction.

Maybe strange is that such an optimization is not implemented for locally constructed arrays or classes.

E.g.:

using System;

var test = new Test();
test.A = null;
test.B = default;
test.B = default;
Console.WriteLine(test.A);
Console.WriteLine(test.B);

class Test{
  public string A;   
  public int B;
  public int C;
}

is compiled to:

; Core CLR 8.0.624.26715 on x86

Program..ctor()
    L0000: ret

Program.<Main>$(System.String[])
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push esi
    L0004: mov ecx, 0x277aca2c
    L0009: call 0x06ec300c
    L000e: mov esi, eax
    L0010: xor ecx, ecx
    L0012: mov [esi+4], ecx
    L0015: mov [esi+8], ecx
    L0018: mov [esi+8], ecx
    L001b: mov ecx, [esi+4]
    L001e: call dword ptr [0x13756f58]
    L0024: mov ecx, [esi+8]
    L0027: call dword ptr [0x13756ee0]
    L002d: pop esi
    L002e: pop ebp
    L002f: ret

Test..ctor()
    L0000: ret

I guess the runtime assumes always other references to the heap storage.
And because such optimizations are not handled in the runtime I think the IL CodeGen was adjusted (for arrays).

So I would understand if you say - well it is optimized enough by the runtime.

@bernd5
Copy link
Contributor Author

bernd5 commented Jul 8, 2024

The benchmark code I used was:

using System;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;

var config = DefaultConfig.Instance;
var summary = BenchmarkRunner.Run <TestInlineArray>(config, args);

[DisassemblyDiagnoser]
//[RPlotExporter]
[SimpleJob(runtimeMoniker: BenchmarkDotNet.Jobs.RuntimeMoniker.Net80)]
//[SimpleJob(runtimeMoniker: BenchmarkDotNet.Jobs.RuntimeMoniker.Mono)]
public class TestInlineArray
{
    [Benchmark]
    public void CreateEmptySpanWithCollectionExpression()
    {
        Span<byte> bytes = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0,
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0];

        UseSpan(bytes);
    }

    [Benchmark]
    public void CreateEmptySpanWithInlineArrayLikeTheCompilerDoesDirectAndWithDisabledGitOptimization()
    {
        var array = CreateInlineArraySoTheJitDoesNotSeeItIsDefaulted();
        array[0] = 0;
        array[1] = 0;
        array[2] = 0;
        array[3] = 0;
        array[4] = 0;
        array[5] = 0;
        array[6] = 0;
        array[7] = 0;
        array[8] = 0;
        array[9] = 0;

        array[10] = 0;
        array[11] = 0;
        array[12] = 0;
        array[13] = 0;
        array[14] = 0;
        array[15] = 0;
        array[16] = 0;
        array[17] = 0;
        array[18] = 0;
        array[19] = 0;

        array[20] = 0;
        array[21] = 0;
        array[22] = 0;
        array[23] = 0;
        array[24] = 0;
        array[25] = 0;
        array[26] = 0;
        array[27] = 0;
        array[28] = 0;
        array[29] = 0;

        UseSpan(array);

        [MethodImpl(MethodImplOptions.NoInlining)]
        InlineArrayX CreateInlineArraySoTheJitDoesNotSeeItIsDefaulted()
        {
            return new();
        }
    }

    [Benchmark]
    public void CreateEmptySpanWithInlineArrayDirect()
    {
        var array = new InlineArrayX();
        Span<byte> bytes = array;
        UseSpan(bytes);
    }

    [Benchmark]
    public void CreateEmptySpanWithStackAlloc()
    {
        Span<byte> bytes = stackalloc byte[30];
        UseSpan(bytes);
    }

    private void UseSpan(Span<byte> span)
    {
        foreach (var s in span)
        {
            if (s != 0)
            {
                throw new Exception();
            }
        }
    }

    [InlineArray(30)]
    private struct InlineArrayX
    {
        public byte Data;
    }
}

CreateEmptySpanWithInlineArrayLikeTheCompilerDoesDirectAndWithDisabledGitOptimization shows what happens with disabled runtime optimization.

The result was:

| Method                                                                                | Mean     | Error    | StdDev   | Code Size |
|-------------------------------------------------------------------------------------- |---------:|---------:|---------:|----------:|
| CreateEmptySpanWithCollectionExpression                                               | 16.01 ns | 0.355 ns | 0.332 ns |     158 B |
| CreateEmptySpanWithInlineArrayLikeTheCompilerDoesDirectAndWithDisabledGitOptimization | 24.14 ns | 0.313 ns | 0.261 ns |     340 B |
| CreateEmptySpanWithInlineArrayDirect                                                  | 15.31 ns | 0.163 ns | 0.136 ns |     157 B |
| CreateEmptySpanWithStackAlloc                                                         | 16.22 ns | 0.168 ns | 0.157 ns |     188 B |

@jaredpar jaredpar removed the Resolution-By Design The behavior reported in the issue matches the current design label Jul 11, 2024
@jaredpar
Copy link
Member

Thansk fro the benchmark. Closing for now because it seems like the runtime is handling these cases fairly good today.

Can re-open if we find this is not the case or find real world cases where this is a problem.

@jaredpar jaredpar closed this as not planned Won't fix, can't repro, duplicate, stale Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants