Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Enable compaction of all BBJ_ALWAYS blocks #103785

Merged
merged 8 commits into from
Jun 21, 2024

Conversation

amanasifkhalid
Copy link
Member

@amanasifkhalid amanasifkhalid commented Jun 20, 2024

Part of #93020. Removes any reference to bbNext in block compaction, and instead considers compacting each block with its jump target regardless of their relative positions in the blocklist. This can churn the flowgraph significantly, but so long as this churn happens before block layout, we can rely on optOptimizeLayout to create a sensible layout regardless of the initial churn. However, since we can run fgUpdateFlowGraph after we've reordered blocks (such as during lowering), I updated fgUpdateFlowGraph's signature to control whether we enable compaction of non-contiguous blocks to avoid changing the block layout.

The logic for updating profile data during compaction looked like it could be simplified, though I'm not sure if my change is overly simplistic; this change seems to have churned fgComputeBlockWeights, hence the large PerfScore diffs. Diffs as a whole are dramatic, though they're inflated largely by libraries_tests.

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Thanks!

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 20, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@amanasifkhalid amanasifkhalid marked this pull request as ready for review June 21, 2024 14:50
Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting (and impressive) that this has so much impact.

Can you see if this further improves the case in #4324? RBO was blocked there because of un-compacted blocks.

noway_assert((block->bbWeight == BB_ZERO_WEIGHT) || (bNext->bbWeight == BB_ZERO_WEIGHT));
block->bbSetRunRarely();
}
if (hasProfileWeight)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ineheritWeight already takes care of setting the flag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was running into cases where we'd compact a block with BBF_PROF_WEIGHT set, with a block without the flag, so inheritWeight would unset the flag. I'm guessing we'd want to keep this flag set in such cases, so I added this workaround.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. If you remember the cases it might be worth seeing how we end up with a mixture of profiled and unprofiled blocks. Would be nice to systematically reduce how often this happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take another look, but I first noticed this when we would compact a block with profile data with an internal block.

@amanasifkhalid
Copy link
Member Author

Can you see if this further improves the case in #4324? RBO was blocked there because of un-compacted blocks.

Codegen is identical with and without this change; not sure how much of this is driven by weird block layout decisions (without PGO data, ordering conditional blocks' successors is pretty arbitrary), but IG07 suggests there's room to improve?

G_M31901_IG01:  ;; offset=0x0000
       push     rbx
       sub      rsp, 32
       mov      ebx, edx
                                                ;; size=7 bbWeight=1 PerfScore 1.50
G_M31901_IG02:  ;; offset=0x0007
       mov      rcx, gword ptr [rcx+0x08]
       mov      rcx, gword ptr [rcx+0x08]
       test     rcx, rcx
       je       SHORT G_M31901_IG04
                                                ;; size=13 bbWeight=1 PerfScore 5.25
G_M31901_IG03:  ;; offset=0x0014
       mov      r9d, dword ptr [rcx+0x10]
       test     r9d, r9d
       je       SHORT G_M31901_IG06
       mov      rcx, gword ptr [rcx+0x08]
       mov      edx, ebx
       xor      r8d, r8d
       call     [System.Array:IndexOf[int](int[],int,int,int):int]
       mov      ecx, eax
       not      ecx
       shr      ecx, 31
       jmp      SHORT G_M31901_IG07
                                                ;; size=33 bbWeight=0.50 PerfScore 5.88
G_M31901_IG04:  ;; offset=0x0035
       mov      eax, ebx
                                                ;; size=2 bbWeight=0.50 PerfScore 0.12
G_M31901_IG05:  ;; offset=0x0037
       add      rsp, 32
       pop      rbx
       ret
                                                ;; size=6 bbWeight=0.50 PerfScore 0.88
G_M31901_IG06:  ;; offset=0x003D
       xor      ecx, ecx
                                                ;; size=2 bbWeight=0.50 PerfScore 0.12
G_M31901_IG07:  ;; offset=0x003F
       test     ecx, ecx
       je       SHORT G_M31901_IG04
       xor      eax, eax
                                                ;; size=6 bbWeight=0.50 PerfScore 0.75
G_M31901_IG08:  ;; offset=0x0045
       add      rsp, 32
       pop      rbx
       ret
                                                ;; size=6 bbWeight=0.50 PerfScore 0.88

; Total bytes of code 75, prolog size 5, PerfScore 15.38, instruction count 29, allocated bytes for code 75 (MethodHash=031d8362) for method Visitor:Test(int):int:this (FullOpts)
; ============================================================

@AndyAyersMS
Copy link
Member

IG07 suggests there's room to improve?

Yeah, IG06/IG07 is a missed case for RBO. No need to retest a value we've just set to zero

@amanasifkhalid amanasifkhalid merged commit 3b14b0c into dotnet:main Jun 21, 2024
107 checks passed
@amanasifkhalid amanasifkhalid deleted the compact-blocks branch June 21, 2024 16:15
rzikm pushed a commit to rzikm/dotnet-runtime that referenced this pull request Jun 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants