-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding new ReadyToRun helper for static cctor #76898
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsAs described in issue #69635, when accessing a static field, the static constructor of the class is called. Then the helper CORINFO_HELP_READYTORUN_STATIC_BASE was printed twice during the same block, the first for the constructor and the second for the getter, as if they were two calls for the same method. This PR introduced a new helper name to differentiate both scenarios.
|
Is it possible to unite them to have just a single helper call in that case to access a field and run cctor if needed? |
@trylek had some ideas in #69635 (comment) but it didn't look too straightforward. |
I don't mind taking a change like this the next time we are taking a JIT-EE GUID change anyway, but I don't think we need to churn things to take this alone. In fact I think taking it would be nice, it is very confusing right now that the |
@BrianBohe while you're there could you also remove |
Why is that hard? It looks like a pretty straightforward JIT optimization to me. |
Do you mean eliding If the I guess it would also need a new JIT-EE function since these helper functions do not carry any information except in the entry points about the class they are accessing. One thing I'm confused about from looking at |
I meant folding CCTOR_TRIGGER followed by STATIC_BASE call. CCTOR_TRIGGER is unnecessary in this case. The STATIC_BASE calls always trigger the cctor if necessary.
You should be able to track the information in the call node, similar to how class or method handle is tracked for calls. We do not have JIT/EE interface method to give you method handle from resolved entrypoint for these cases either. |
Yes, the cctor trigger is optimized to just return once the cctor runs. An alternative option would be to emit conditional check in the JITed code for cctor triggers. It would have better perf, but also produce larger code size. |
Hopefully we could find a place to put it. Increasing the size of
Ah ok, I understand now. For some reason I was under the impression that one stage of these fixups was running eagerly (during external fixup), but it is not the case. |
Related prototype: #47901 (for jit) - we decided to give up on it since larger code size wasn't justified for JIT where we usually get rid of the checks in tier1 anyway. |
src/coreclr/jit/flowgraph.cpp
Outdated
@@ -917,7 +917,7 @@ GenTreeCall* Compiler::fgGetSharedCCtor(CORINFO_CLASS_HANDLE cls) | |||
memset(&resolvedToken, 0, sizeof(resolvedToken)); | |||
resolvedToken.hClass = cls; | |||
|
|||
return impReadyToRunHelperToTree(&resolvedToken, CORINFO_HELP_READYTORUN_STATIC_BASE, TYP_BYREF); | |||
return impReadyToRunHelperToTree(&resolvedToken, CORINFO_HELP_READYTORUN_CCTOR_TRIGGER, TYP_BYREF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are here, can you change this to return TYP_VOID
? If I'm not mistaken the effect of the TYP_BYREF
here is that we are reporting random garbage value to the GC as a byref.
The builds and tests are failing with:
|
Scratch my previous comment. The cctor helper partially exists already, it is just not used in all the right places. It makes sense to fix that. |
src/coreclr/jit/valuenumfuncs.h
Outdated
@@ -116,6 +116,7 @@ ValueNumFuncDef(GetsharedNongcstaticBase, 2, false, true, true) | |||
ValueNumFuncDef(GetsharedGcstaticBaseNoctor, 1, false, true, true) | |||
ValueNumFuncDef(GetsharedNongcstaticBaseNoctor, 1, false, true, true) | |||
ValueNumFuncDef(ReadyToRunStaticBase, 1, false, true, true) | |||
ValueNumFuncDef(ReadyToRunCctorTrigger, 1, false, true, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should end up needing this. Now that this is TYP_VOID
it is going to get void value number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this (and the ReadyToRunStaticBase
above it) can be removed.
Looking at NAOT it does not implement/use the That probably means we need to fix JIT to keep creating the Another question is if we should just move crossgen2 to be on the same plan as NAOT, i.e. get rid of The optimization might still be worth it even with this CSE'ing. It will allow us to drop the redundant |
Unrelated: are static cctor calls even exist for the snippet in the issue static class Foo
{
public static int Bar { get; } = 1;
static Foo() { } // even the empty cctor will force it to emit the helper
// call twice in each static property's getter.
} ? I expect |
Since we only insert the cctor trigger in morph it would be pretty easy to pick the "best" STATIC_BASE variant as the CCTOR_TRIGGER and let CSE clean up later. That is, if we imported a GC field then insert the GC variant, if we imported a non-GC field, pick the non-GC variant, and otherwise stay with the |
Yes, having as few differences as possible between crossgen2 and NAOT is goodness. |
I guess the reason why somebody introduced CCTOR_TRIGGER for R2R was that it can be just |
ad5b34c
to
452183d
Compare
Static base helper was removed and cctor returns void.
f9ed031
to
abb0e3d
Compare
jit-diff diff --pmi throws no difference:
jit-diff diff throws an overall improvement:
|
Now one of the helpers calls is being repeated at the beginning of the code and its result is being stored in a register. Then one register is being locked since the beginning of the code, triggering a cascade regression in a few examples, and improvements in others. |
If we duplicate the first helper call we may avoid the regression cases (holding a register for a long time). Do we know at import phase if the relative order of the helper calls will persist? |
Can you show the diff? I wouldn't be worried about a few minor regressions, especially if it is just due to slightly worse register allocation. That is unavoidable in many cases when making changes early in the JIT phases, and trying to compensate for it upstream makes things very entangled. |
It is initialized with CORINFO_HELP_UNDEF and set to R2R GC or NON GC Static Base later.
New diff looks like this:
|
Regressions looks to be cases in which CSE didn't took place or took place, but the assignation of registers end up adding a few bytes more, something very similar to what I found previously. Here for example, in diff (right), r14 is used to store the result of the helper call, where previously on base (left) rax was used directly. |
Here is another example, in this scenario base (left) is not storing the return from its first call to CORINFO_HELP_READYTORUN_GCSTATIC_BASE, and diff (right) stores it but not CSE the call with its next appearance. Diff does CSE but with another helper call that is further below in another instruction group. |
If CSE decides to introduce long lived temporaries that aren't profitable it's a CSE problem. I don't think you need to be concerned that this change provokes that for a small number of cases.
It sounds likely that the second |
You are right, they are from different classes:
|
As described in issue #69635, when accessing a static field, the static constructor of the class is called. Then the helper CORINFO_HELP_READYTORUN_STATIC_BASE was printed twice during the same block, the first for the constructor and the second for the getter, as if they were two calls for the same method.
This PR introduced a new helper name to differentiate both scenarios.