Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Generalize strategy for finding addrecs #99048

Merged
merged 6 commits into from
Mar 1, 2024

Conversation

jakobbotsch
Copy link
Member

When looking at a PHI, utilize a strategy where a symbolic node representing the recurrence is inserted in the cache, after which the analysis is invoked recursively. The created SCEV can afterwards be turned into a recurrence by considering the symbolic node to be a recursive occurrence.

When looking at a PHI, utilize a strategy where a symbolic node
representing the recurrence is inserted in the cache, after which the
analysis is invoked recursively. The created SCEV can afterwards be
turned into a recurrence by considering the symbolic node to be a
recursive occurrence.
@ghost ghost assigned jakobbotsch Feb 28, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 28, 2024
@ghost
Copy link

ghost commented Feb 28, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

When looking at a PHI, utilize a strategy where a symbolic node representing the recurrence is inserted in the cache, after which the analysis is invoked recursively. The created SCEV can afterwards be turned into a recurrence by considering the symbolic node to be a recursive occurrence.

Author: jakobbotsch
Assignees: jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Feb 28, 2024

FYI @dotnet/jit-contrib... From the diffs I'm not sure the complexity is worth it. Seems we recognize and widen around 1400 more primary IVs across the collections. I might try to compare this with a simpler approach that only follows SSA defs but otherwise sticks with the "simple" add recs only.

@AndyAyersMS
Copy link
Member

How hard is it to back-map these to the C# for the methods? I am curious if these cases can arise from vanilla C# or if they're from various bespoke loops.

Failing to recognize primary IVs may eventually lock us out of other optimizations.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Feb 29, 2024

How hard is it to back-map these to the C# for the methods? I am curious if these cases can arise from vanilla C# or if they're from various bespoke loops.

Failing to recognize primary IVs may eventually lock us out of other optimizations.

Some of the cases I looked at were due to CSE -- for example, if the loop body has an occurrence of i + 1, then CSE may kick in and since the existing simplistic thing does not follow SSA defs it cannot analyze them. So those cases are definitely naturally occurring.

There are also cases we still do not get even with this PR, and those arise even from standard looking loops. I left a comment about one of them in the code, but that case arises from this loop:

private void CopyKeys(Array array, int arrayIndex)
{
Debug.Assert(array != null);
Debug.Assert(array.Rank == 1);
Bucket[] lbuckets = _buckets;
for (int i = lbuckets.Length; --i >= 0;)
{
object? keyv = lbuckets[i].key;
if ((keyv != null) && (keyv != _buckets))
{
array.SetValue(keyv, arrayIndex++);
}
}
}

--i >= 0 results in IL sub; dup; stloc, and we end up with IR like:

STMT00005 ( 0x04A[E-] ... ??? )
               [000020] DA---------                           STORE_LCL_VAR int    V07 tmp1         
               [000019] -----------                         └──▌  SUB       int   
               [000017] -----------                            ├──▌  LCL_VAR   int    V04 loc1         
               [000018] -----------                            └──▌  CNS_INT   int    1
Marked V07 as a single def local

    [ 2]  78 (0x04e) stloc.1

STMT00006 ( ??? ... ??? )
               [000023] DA---------                           STORE_LCL_VAR int    V04 loc1         
               [000022] -----------                         └──▌  LCL_VAR   int    V07 tmp1         

    [ 1]  79 (0x04f) ldc.i4.0 0
    [ 2]  80 (0x050) bge.s

STMT00007 ( ??? ... ??? )
               [000026] -----------                           JTRUE     void  
               [000025] -----------                         └──▌  GE        int   
               [000021] -----------                            ├──▌  LCL_VAR   int    V07 tmp1         
               [000024] -----------                            └──▌  CNS_INT   int    0

later on local copy prop comes along and creates more uses of V07 (which does not seem like a good idea since it extends the lifetime of V07 -- would be interesting to try to utilize liveness to avoid this...). Loop inversion then also comes along and makes V07 into a fully fledged induction variable. We essentially end up with a loop that looks like

Bucket[] lbuckets = _buckets;
int i = lbuckets.Length;
int j = i - 1;
i = j;
if (j >= 0)
{
  while (true)
  {
    object? keyv = lbuckets[j].key; // local copy prop changed i to j, making j and i both into IVs because of loop inversion
    if ((keyv != null) && (keyv != _buckets))
    {
      array.SetValue(keyv, arrayIndex++);
    }
    j = i - 1;
    i = j;
    if (j < 0)
      break;
  } 
}

and we aren't able to analyze that j is an add recurrence <L, [initial value of j], -1> (we do manage to analyze i). We somehow need to reconcile that the i - 1 when we come around the backedge while analyzing j matches <L, [initial value of j], -1>. But what we see at that point is <L, [initial value of i], -1> + (-1), so I'm not quite sure how to reconcile that.

I collected some more stats: over ASP.NET we can analyze 1202 header PHIs into add recs that we couldn't analyze before. There are 30970 loops found in the loop finding phase, so that's IVs in around 3.8% of all loops.
For benchmarks.run_pgo those numbers are 1368 and 35081 respectively (again 3.8%), while for libraries_tests.run they are 8296, 112094, 7.4%.
So it definitely seems worth it to have this more powerful recognition, and the TP cost looks small as well, so I think I'm just going to say the complexity is warranted.

@jakobbotsch jakobbotsch marked this pull request as ready for review February 29, 2024 09:56
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @BruceForstall @AndyAyersMS

With this we are able to analyze both the examples that the comment I removed covered. For example:

[MethodImpl(MethodImplOptions.NoInlining)]
public static int Foo(int[] arr)
{
    int sum = 0;
    int i = 0;
    int j = 0;
    while (i < arr.Length)
    {
        sum += arr[i];
        i += j;
        j++;
    }

    return sum;
}
STMT00009 ( ??? ... ??? )
N004 (  0,  0) [000049] DA---------                           STORE_LCL_VAR int    V02 loc1         d:3 $VN.Void
N003 (  0,  0) [000048] -----------                         └──▌  PHI       int    $242
N001 (  0,  0) [000062] ----------- pred BB03                  ├──▌  PHI_ARG   int    V02 loc1         u:4
N002 (  0,  0) [000059] ----------- pred BB02                  └──▌  PHI_ARG   int    V02 loc1         u:2 $c0
  => <L00, V02.2 (0), <L00, V03.2 (0), 1>>

(a chain of recurrences, where the step value itself evolves). Another case:

int sum = 0;
int i = 0;
while (i < arr.Length)
{
    sum += arr[i++];
    sum += arr[i++];
    sum += arr[i++];
    sum += arr[i++];
}

return sum;
STMT00017 ( ??? ... ??? )
N004 (  0,  0) [000123] DA---------                           STORE_LCL_VAR int    V02 loc1         d:3 $VN.Void
N003 (  0,  0) [000122] -----------                         └──▌  PHI       int    $241
N001 (  0,  0) [000132] ----------- pred BB03                  ├──▌  PHI_ARG   int    V02 loc1         u:7
N002 (  0,  0) [000130] ----------- pred BB02                  └──▌  PHI_ARG   int    V02 loc1         u:2 $c0
  => <L00, V02.2 (0), 4>

@AndyAyersMS
Copy link
Member

But what we see at that point is <L, [initial value of i], -1> + (-1), so I'm not quite sure how to reconcile that.

Since you have the VNs for the initial values and steps, can't you pose this as a VN equality check? Basically ask if the VN for the init and step of one is the same as the corresponding VNs for the other?

// and turning the recursive result into an addrec.
//
return CreateSimpleAddRec(store, enterScev, ssaDsc->GetBlock(), ssaDsc->GetDefNode()->Data());
Scev* simpleAddRec = CreateSimpleAddRec(store, enterScev, ssaDsc->GetBlock(), ssaDsc->GetDefNode()->Data());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to retain the prior comment here and augment so there's as a worked-out example of how these symbolic addrecs end up getting resolved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense... let me add it as part of a future PR.

@jakobbotsch
Copy link
Member Author

Since you have the VNs for the initial values and steps, can't you pose this as a VN equality check? Basically ask if the VN for the init and step of one is the same as the corresponding VNs for the other?

Something like that may be possible.. I need to think about what kind of manipulation is legal on these "temporary" scevs that may or may not have recursive occurrences. A priori it's not so clear to me -- not seeing any recursive appearance of the SCEV would typically mean the IV is of some unrepresentable shape (e.g. a flip-flop).

@jakobbotsch
Copy link
Member Author

superpmi-diffs failure should have been fixed by #99116. libraries tests failure looks like #99074.

@jakobbotsch jakobbotsch merged commit 1bebdb0 into dotnet:main Mar 1, 2024
124 of 129 checks passed
@jakobbotsch jakobbotsch deleted the scev-more-addrecs branch March 1, 2024 10:16
@github-actions github-actions bot locked and limited conversation to collaborators Apr 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants