Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Set bbJumpKind and bbJumpDest during block initialization #93415

Merged
merged 8 commits into from
Oct 18, 2023

Conversation

amanasifkhalid
Copy link
Member

Followup to #93152. This refactor enforces new invariants on BasicBlock's bbJumpKind and bbJumpDest. In particular, whenever bbJumpKind is a kind that must have a jump target, bbJumpDest must be set, else bbJumpDest must be null. This means bbJumpKind and bbJumpDest must be simultaneously initialized/updated when creating/converting a jump block; previously, we initialized blocks with their jump kind specified, and later set their jump targets accordingly.

@ghost ghost assigned amanasifkhalid Oct 12, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 12, 2023
@ghost
Copy link

ghost commented Oct 12, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Followup to #93152. This refactor enforces new invariants on BasicBlock's bbJumpKind and bbJumpDest. In particular, whenever bbJumpKind is a kind that must have a jump target, bbJumpDest must be set, else bbJumpDest must be null. This means bbJumpKind and bbJumpDest must be simultaneously initialized/updated when creating/converting a jump block; previously, we initialized blocks with their jump kind specified, and later set their jump targets accordingly.

Author: amanasifkhalid
Assignees: amanasifkhalid
Labels:

area-CodeGen-coreclr

Milestone: -

@amanasifkhalid
Copy link
Member Author

I'm waiting to rebase on top of #93377. Then I'll open for review.

@amanasifkhalid amanasifkhalid marked this pull request as ready for review October 16, 2023 21:56
@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Oct 16, 2023

No asmdiffs. TP diffs on Linux x64 and Windows x86. The regressions for the latter seem a bit dramatic to be attributed to the additional bbJumpDest = nullptr instructions; any ideas?

@amanasifkhalid
Copy link
Member Author

superpmi-replay failure is #93527

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Oct 17, 2023

cc @dotnet/jit-contrib , @AndyAyersMS PTAL.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good.

For throughput, it would be good to do the next level of investigation. I can't remember if we do these lab TP runs with PGO-enhanced release builds, but if we do it's also possible the PGO data we have becomes stale with your changes. You can try and build non-pgo baseline and diff locally and run your own TP diffs perhaps.

There are some pin based tools you can try and use to pin down where the extra time is going. Let me see if I can dig up a pointer to this for you (or maybe someone else on @dotnet/jit-contrib can do this quicker than I can).

Thought for possible follow-up: I wonder if there is some better way to set a "temporary" jump target that ensures that later on we actually have remembered to change it.

callBlock->SetJumpKind(BBJ_CALLFINALLY DEBUG_ARG(this)); // convert the BBJ_LEAVE to BBJ_CALLFINALLY

assert(callBlock->HasJump());
fgRemoveRefPred(callBlock->GetJumpDest(), callBlock);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to impImportLeave always make me nervous.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise -- this was the trickiest of my changes to get right. Most of my debug time was spent on one failing method in the CoreCLR tests supermi collection that exercised an unusual path (the bug in that case was caused by initializing new blocks with jumps as BBJ_NONE while they don't have a jump target, thus triggering a different path in fgNewBBInRegion; changing this approach to use the actual jump kind and a temporary jump target fixed that).

@EgorBo
Copy link
Member

EgorBo commented Oct 17, 2023

I can't remember if we do these lab TP runs with PGO-enhanced release builds,

We disable native PGO for these runs so it should be ok, It's interesting that TP is actually improved for MSVC ones.

@AndyAyersMS
Copy link
Member

For throughput, it would be good to do the next level of investigation.

I would look at the windows x86 results in particular, those should be the easiest to investigate. Note that the "coreclr tests" collection includes a lot of unusual code, so it may just be one particular code shape that's causing this.

@amanasifkhalid
Copy link
Member Author

Thought for possible follow-up: I wonder if there is some better way to set a "temporary" jump target that ensures that later on we actually have remembered to change it.

How about some public static BasicBlock member for this that we can pass to SetJumpKindAndTarget, like static BasicBlock* tempJumpDest = 0xDEADBEEF? Then we can assert the jump target isn't equal to this before reading it, like in GetJumpDest(), HasJumpTo, etc. Though I'm not sure if a hard-coded pointer value is the best approach here...

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Oct 17, 2023

Thought for possible follow-up: I wonder if there is some better way to set a "temporary" jump target that ensures that later on we actually have remembered to change it.

How about some public static BasicBlock member for this that we can pass to SetJumpKindAndTarget, like static BasicBlock* tempJumpDest = 0xDEADBEEF? Then we can assert the jump target isn't equal to this before reading it, like in GetJumpDest(), HasJumpTo, etc. Though I'm not sure if a hard-coded pointer value is the best approach here...

One idea to explore is to just allocate some special block (call it BB0) that can be used as a sentinel and a temp target, and then (say in post-phase diagnostics) we could verify no jump target targets that block. It turns out we already do some things like this, see for example SsaBuilder::SetupBBRoot, and there are other potential uses; for instance, it could serve as an anchor for both fgFirstBB and fgLastBB, and it could be the def block for uninitialized SSA / parms.

However we'd need to be careful that this block doesn't complicate life elsewhere.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bbTempJumpDest seems like a nice option here. Thanks.

Any insight into the TP impact yet?

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Oct 18, 2023

Any insight into the TP impact yet?

Not much yet. I've identified the method contexts in the CoreCLR tests collection with the largest TP diffs when targeting x86, but I don't know of any way to find the actual methods/tests corresponding to those contexts. I see under runtime/artifacts/spmi/pintools/1.0/windows/intel64/bin, there is pinjitprofiling.dll. I don't know what it exactly does -- I can't find any documentation on it, and I cannot run it on my win-x64 box (for some reason, it won't open vcruntime140.dll, regardless of the Visual C++ runtime installed; trying this on another machine to see if it repros) -- but I imagine it might reveal information more useful than just total instructions executed.

@AndyAyersMS do you have any suggestions for what tooling might be useful here? Thank you!

Edit: @jakobbotsch shared a pin tool with me that can profile JIT functions. Trying it now...

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Oct 18, 2023

@AndyAyersMS So it looks like about 30% of the instruction diff on Windows x86 comes from calls to Compiler::bbNewBasicBlock, and another 20% comes from Compiler::impImportLeave. I think the bbNewBasicBlock increase comes from the fact that it now takes two function calls (e.g. we call the version that takes in a jump kind and jump target, which calls the version which takes no arguments and actually creates the block) instead of one, and the no-arg bbNewBasicBlock is probably too big to be inlined. I can try refactoring it locally to see if we can reduce this diff, though that amount of code duplication probably won't be acceptable to check in.

As for impImportLeave, I'm guessing the diff comes from setting a new block's jump target to null (in Release builds) during initialization, and then setting it to its actual jump later (whereas we previously didn't touch the jump target at all during initialization). I can try to refactor the methods that create blocks to not bother setting the jump target at all in release builds if the jump target passed in is null. If we can get rid of the diff this way, would we be okay introducing all the additional conditional compilation, plus giving up the invariant that bbJumpDest == null for blocks without jumps in Release builds? We don't currently rely on that invariant outside of asserts, so I don't think this would break anything in Release.

Also, the diff in total instructions executed is only +0.09% for me when replaying the CoreCLR tests collection locally. Not sure why it's so much bigger on Helix machines...

@AndyAyersMS
Copy link
Member

Thanks for digging in. I'm ok with taking this change as is, since the TP diff is 32 bit and only in coreclr_tests which has some very odd methods.

@amanasifkhalid
Copy link
Member Author

superpmi-replay failure is #93527.

@amanasifkhalid amanasifkhalid merged commit 3a3e1c7 into dotnet:main Oct 18, 2023
127 of 129 checks passed
@amanasifkhalid amanasifkhalid deleted the bbjump branch October 18, 2023 23:26
@jakobbotsch
Copy link
Member

I just came across bbTempJumpDest while merging up some stuff, and I really do not think that is a good idea. It introduces a semantic difference between debug/checked and release builds, where some blocks now have successors only in debug/checked builds. Even if that's only for a short while that makes me nervous. In the cases that require this I would much rather see the first BB created with a different kind until they can be linked together (e.g. insert it as BBJ_NONE, switch it to BBJ_COND after).

It does raise another question: if we actively have to fight the checking, then what is the benefit of having it? It seems like it just creates usability problems. I do not recall seeing us having bugs previously around "half baked flow graphs", and I would much rather allow us to build flow graph structures naturally and then only check them once they are completed (e.g. at the end of the phase, as a normal post-phase check).

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Oct 23, 2023

It introduces a semantic difference between debug/checked and release builds

I enabled bbTempJumpDest in Debug/Checked builds only since the asserts that check the validity of the jump target won't run in Release builds anyway, though I share some of your discomfort with the differences between the builds. I can remove the conditional compilation around this.

I would much rather see the first BB created with a different kind until they can be linked together (e.g. insert it as BBJ_NONE, switch it to BBJ_COND after).

I initially took this approach, but this introduced subtle bugs in a few places. For example, in Compiler::impImportLeave, sometimes we would create a BBJ_ALWAYS without its jump target existing yet, so we would initialize it as BBJ_NONE. But before converting it to BBJ_ALWAYS, we would create and insert a new block after it, and the method that inserts the new block would run some conditional logic based on the fact that the previous block "falls through" (even though we intend for it to be a BBJ_ALWAYS). So temporarily setting the block to BBJ_NONE didn't work in a few exceptional cases.

I'm also cognizant of the fact that we plan to either significantly restrict our usage of BBJ_NONE before block reordering or remove it altogether (see here), so I'm hesitant to introduce more usages of BBJ_NONE that may have to be removed in the near future. I guess we could introduce another jump kind (like some sort of BBJ_UNINIT), but that only reaffirms your next point...

It does raise another question: if we actively have to fight the checking, then what is the benefit of having it?

I certainly agree with you that these weird cases aren't elegant with my recent changes. I take solace in the fact that these cases are the exception, and not the norm -- if I recall correctly, we're only using bbTempJumpDest in a few places. As for potential bugs around our flow graph code, the only inconsistency I found was our nulling of bbJumpDest for blocks that are converted from a kind with jumps to a kind without; sometimes we'd set bbJumpDest to null, and other times we wouldn't. This seemed like it could be problematic because we have a few places in the JIT where we check if the jump target is null (rather than if the jump kind is valid) in conditional statements (not just asserts), but this didn't seem to cause issues before. With the upcoming churn from the block reordering changes, perhaps these checks will prove useful. But maybe the benefit we get from these checks during large, infrequent refactors don't justify the long-term usability challenges.

@AndyAyersMS what do you think?

@AndyAyersMS
Copy link
Member

One of the hopes here is that we can eventually automate most of the ref count and pred list maintenance. So it seems preferable to not to create something with implicitly incorrect flow and then fix it up; instead we should have some transient "incomplete" state with obviously incorrect flow that must resolved.

I guess we could introduce another jump kind (like some sort of BBJ_UNINIT), but that only reaffirms your next point...

Jakob would you find this more palatable? It would not be debug-only, would not be allowed at end of phase, and most of the code that checks block kinds should already have default/error cases for unexpected kinds, so perhaps this would be more robust?

@jakobbotsch
Copy link
Member

jakobbotsch commented Oct 23, 2023

I certainly agree with you that these weird cases aren't elegant with my recent changes. I take solace in the fact that these cases are the exception, and not the norm -- if I recall correctly, we're only using bbTempJumpDest in a few places.

Maybe I was just unlucky, but when merging I ended up in one of these cases.

I like the BBJ_UNINIT idea, it definitely seems more robust (and better than utilizing BBJ_NONE). Though would it help in that impImportLeave case you identified @amanasifkhalid?

I do think a more ideal solution would be to separate the notion of "building" from the notion of "adding". Building new blocks should not have any checks, while the checking/state computations can be made when adding the result of the builder to the flow graph. Not sure how easy it is to design such a graph builder class that can support all of our patterns, though.

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Oct 23, 2023

Though would it help in that impImportLeave case you identified @amanasifkhalid?

I think so. I imagine we'd only use BBJ_UNINIT for blocks we intend to have a jump, so we could have BasicBlock::bbFallsThrough return false for it.

I do think a more ideal solution would be to separate the notion of "building" from the notion of "adding".

Maybe we could have Compiler::bbNewBasicBlock set the jump kind and target without any checks (though we might have to refactor bbNewBasicBlock to be a member of BasicBlock so it can access its private members). Then if we just want to modify the jump kind/target, we'll have to use BasicBlock::SetJumpKindAndTarget, which does the checks. How does that sound?

Edit: On second thought, I don't think BBJ_UNINIT will work for the edge cases I ran into. Sometimes we're creating a BBJ_ALWAYS that we don't know the jump target for yet, and other times we're creating a BBJ_COND, so we don't have enough information to determine if BBJ_UNINIT should fall through or not. We could add two versions of BBJ_UNINIT, but I think that starts to get messy. I'm thinking of doing something like Jakob's suggestion by removing the checks in bbNewBasicBlock, and just doing the checks any time we read or write to the jump kind or target after initialization. So if we never initialize the jump target for a block that needs one, we'll assert as soon as we try to use the jump target.

amanasifkhalid added a commit that referenced this pull request Oct 24, 2023
...and remove BasicBlock::bbTempJumpDest, per discussion in #93415. We still assert the jump target is set appropriately whenever it is read/written, and in the majority of cases, we still initialize blocks with their jump kind and target set simultaneously. This change improves usability for the few edge cases (like in Compiler::impImportLeave) where a block's jump target isn't known at initialization.
liveans pushed a commit to liveans/dotnet-runtime that referenced this pull request Nov 9, 2023
...and remove BasicBlock::bbTempJumpDest, per discussion in dotnet#93415. We still assert the jump target is set appropriately whenever it is read/written, and in the majority of cases, we still initialize blocks with their jump kind and target set simultaneously. This change improves usability for the few edge cases (like in Compiler::impImportLeave) where a block's jump target isn't known at initialization.
@ghost ghost locked as resolved and limited conversation to collaborators Nov 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants