Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Remove fallthrough restrictions in Compiler::fgOptimizeBranch #105625

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

amanasifkhalid
Copy link
Member

While investigating perf regressions from flowgraph changes, I noticed some instances where Compiler::fgOptimizeBranch was no longer kicking in due to conditional blocks not matching the implicit fallthrough constraint we previously removed (example). Removing this constraint is simple enough, and fixes the subpar block layout (and the edge likelihoods of the cloned condition) in the linked example:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight        IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             0.91       91 [000..019)-> BB03(1)                 (always)                     i IBC hascall
BB03 [0014]  2       BB01,BB10             9.09      909 [018..019)-> BB05(1)                 (always)                     i IBC loophead mdidxlen bwd bwd-target mdarr
BB05 [0015]  2       BB03,BB08            90.91     9091 [018..019)-> BB07(1)                 (always)                     i IBC loophead mdidxlen bwd bwd-target mdarr
BB07 [0016]  2       BB05,BB07            1000.   100000 [018..019)-> BB07(0.909),BB08(0.0909)  ( cond )                     i IBC loophead mdidxlen bwd bwd-target mdarr
BB08 [0063]  1       BB07                100.00    10000 [018..019)-> BB05(0.909),BB10(0.0909)  ( cond )                     i IBC bwd
BB10 [0064]  1       BB08                 10.00     1000 [018..019)-> BB03(0.909),BB12(0.0909)  ( cond )                     i IBC bwd
BB12 [0065]  1       BB10                  1.00      100 [018..06D)-> BB14(0.994),BB28(0.00595)   ( cond )                     i IBC bwd
BB14 [0049]  2       BB12,BB26           153.66    15366 [020..024)-> BB17(1)                 (always)                     i IBC loophead bwd
BB17 [0002]  2       BB14,BB24            1741.   174141 [024..029)-> BB19(1)                 (always)                     i IBC loophead bwd bwd-target
BB19 [0003]  2       BB17,BB22           17492.  1749182 [029..02E)-> BB21(1)                 (always)                     i IBC loophead mdidxlen bwd bwd-target mdarr
BB21 [0004]  2       BB19,BB21             194k 19378019 [02E..050)-> BB21(0.91),BB22(0.0903) ( cond )                     i IBC loophead mdidxlen bwd bwd-target mdarr
BB22 [0059]  1       BB21                19227.  1922741 [050..05C)-> BB19(0.909),BB24(0.0906)  ( cond )                     i IBC bwd
BB24 [0060]  1       BB22                 1915.   191483 [05C..065)-> BB17(0.92),BB26(0.0802) ( cond )                     i IBC bwd
BB26 [0061]  1       BB24                167.07    16707 [065..06A)-> BB14(0.994),BB28(0.00595)   ( cond )                     i IBC bwd
BB28 [0062]  2       BB12,BB26             0.91       91 [06D..06E)-> BB31(1)                 (always)                     i IBC
BB31 [0025]  2       BB28,BB39             9.09      909 [06D..06E)-> BB33(1)                 (always)                     i IBC loophead bwd bwd-target
BB33 [0026]  2       BB31,BB37            90.91     9091 [06D..06E)-> BB35(1)                 (always)                     i IBC loophead mdidxlen bwd bwd-target mdarr
BB35 [0027]  2       BB33,BB36            1000.   100000 [06D..06E)-> BB36(1),BB43(0)         ( cond )                     i IBC loophead mdidxlen bwd bwd-target mdarr
BB36 [0029]  1       BB35                 1000.   100000 [06D..06E)-> BB35(0.909),BB37(0.0909)  ( cond )                     i IBC bwd
BB37 [0056]  1       BB36                100.00    10000 [06D..06E)-> BB33(0.909),BB39(0.0909)  ( cond )                     i IBC bwd
BB39 [0057]  1       BB37                 10.00     1000 [06D..06E)-> BB31(0.909),BB41(0.0909)  ( cond )                     i IBC bwd
BB41 [0058]  1       BB39                  1.00      100 [06D..06E)-> BB44(1)                 (always)                     i IBC
BB44 [0036]  2       BB41,BB43             1.00      100 [06D..077)                           (return)                     i IBC
BB43 [0028]  1       BB35                  0           0 [06D..06E)-> BB44(1)                 (always)                     i IBC rare
BB45 [0066]  0                             0             [???..???)                           (throw )                     i rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

I was hoping this would improve the layout of some of the regressed methods in #103972, though we seem to hit the cost limit for cloning quite frequently. For example, from System.Linq.Tests.Perf_Enumerable.Aggregate_Seed(input: IEnumerable):

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight       IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1      19852 [000..003)-> BB03(1),BB02(0)         ( cond )                     i IBC
BB02 [0001]  1       BB01                  0          0 [003..00A)                           (throw )                     i IBC rare hascall gcsafe
BB03 [0002]  1       BB01                  1      19852 [00A..00D)-> BB05(1),BB04(0)         ( cond )                     i IBC
BB04 [0003]  1       BB03                  0          0 [00D..013)                           (throw )                     i IBC rare hascall gcsafe
BB05 [0004]  1       BB03                  1      19852 [013..015)-> BB07(0),BB06(1)         ( cond )                     i IBC
BB06 [0016]  1       BB05                  1      19852 [???..???)-> BB08(1)                 (always)                     i IBC internal hascall gcsafe
BB07 [0017]  1       BB05                  0          0 [???..???)-> BB08(1)                 (always)                     i IBC rare internal hascall gcsafe
BB08 [0015]  2       BB06,BB07             1      19852 [015..01C)-> BB09(1)                 (always)                     i IBC internal
BB09 [0005]  2  0    BB08,BB17           100.54 1995916 [01C..02E)-> BB21(0),BB18(1)         ( cond ) T0      try {       i IBC keep bwd
BB10 [0006]  2  0    BB19,BB21            99.54 1976064 [01E..01E)-> BB14(0),BB11(1)         ( cond ) T0                  i IBC bwd bwd-target
BB11 [0019]  1  0    BB10                 99.54 1976064 [01E..01F)-> BB13(1),BB12(0)         ( cond ) T0                  i IBC bwd
BB12 [0032]  1  0    BB11                  0          0 [01E..01F)                           (throw ) T0                  i IBC rare hascall gcsafe bwd
BB13 [0033]  1  0    BB11                 99.54 1976064 [01E..025)-> BB16(0),BB15(1)         ( cond ) T0                  i IBC idxlen bwd
BB14 [0020]  1  0    BB10                  0          0 [???..???)-> BB16(1)                 (always) T0                  i IBC rare internal hascall gcsafe bwd
BB15 [0022]  1  0    BB13                 99.54 1976064 [???..???)-> BB17(1)                 (always) T0                  i IBC internal bwd
BB16 [0023]  2  0    BB13,BB14             0          0 [???..???)-> BB17(1)                 (always) T0                  i IBC rare internal hascall gcsafe bwd
BB17 [0021]  2  0    BB15,BB16            99.54 1976064 [025..02E)-> BB09(1)                 (always) T0                  i IBC internal bwd
BB18 [0025]  1  0    BB09                100.54 1995916 [02E..02F)-> BB20(0.00998),BB19(0.99)  ( cond ) T0                  i IBC bwd
BB19 [0037]  1  0    BB18                 99.54 1975990 [02E..02F)-> BB10(1)                 (always) T0                  i IBC bwd
BB20 [0038]  1  0    BB18                  1.00   19926 [02E..02F)-> BB23(1)                 (always) T0                  i IBC bwd
BB21 [0026]  1  0    BB09                  0          0 [02E..036)-> BB10(0.99),BB23(0.00995)  ( cond ) T0      }           i IBC rare internal hascall bwd
BB23 [0041]  2       BB20,BB21             1.00   19772 [038..03B)-> BB27(1),BB26(0)         ( cond )                     i IBC keep cfb
BB26 [0044]  1       BB23                  0          0 [???..???)-> BB27(1)                 (always)                     i IBC rare internal hascall gcsafe
BB27 [0045]  2       BB23,BB26             1.00   19852 [041..044)                           (return)                     i IBC
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ funclets follow
BB28 [0009]  1     0                       0.00       0 [038..03B)-> BB32(0.00402),BB29(0.996)   ( cond )    H0 F fault {     i IBC keep flet
BB29 [0010]  1     0 BB28                  0.00       0 [03B..???)-> BB32(1),BB31(0)         ( cond )    H0               i IBC
BB31 [0029]  1     0 BB29                  0          0 [???..???)-> BB32(1)                 (always)    H0               i IBC rare internal hascall gcsafe
BB32 [0027]  3     0 BB28,BB29,BB31        0.00       0 [041..042)                           (falret)    H0   }           i IBC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------


Duplication of the conditional block BB09 (always branch from BB17) not done, because the cost of duplication (17) is greater than 6, validProfileWeights = true

I wonder if we should increase the cost limit of fgOptimizeBranch to match that of loop inversion, since they seem to be performing similar transformations, though we can consider that for a follow-up PR.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 29, 2024
@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Jul 29, 2024

It's worth noting that we only call fgOptimizeBranch from fgReorderBlocks on the old layout path, and that only runs in optOptimizeFlow, where we aren't actually reordering blocks. So if we decide to get rid of premature calls to fgReorderBlocks (like in #104213), we'll want to keep running fgOptimizeBranch -- it seems relevant for enabling fallthrough into/out of loops.

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Jul 29, 2024

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs. The largest size increases are due to more loop cloning (especially in libraries_tests), whereas the smaller sporadic size increases are from duplicating conditions -- this seems worthwhile, since we can reduce loop iteration logic to one backward conditional jump, rather than a backward unconditional jump to another conditional jump.

@amanasifkhalid amanasifkhalid added this to the 10.0.0 milestone Aug 12, 2024
@amanasifkhalid
Copy link
Member Author

@AndyAyersMS updated diffs look roughly the same. Is it alright if we start taking flowgraph changes for .NET 10?

@AndyAyersMS
Copy link
Member

@AndyAyersMS updated diffs look roughly the same. Is it alright if we start taking flowgraph changes for .NET 10?

I think we may want to wait a little while longer? We can discuss this when we meet later today.

@amanasifkhalid
Copy link
Member Author

It seems suboptimal that we run fgOptimizeBranch before layout, and the optimization is dependent on knowing if the BBJ_ALWAYS block falls into one of the BBJ_COND block's successors. It would be convenient if we could run this as an intermediate step of the new block layout, but since we intend to run the latter after lowering/LSRA, and fgOptimizeBranch operates on HIR and clones code, that doesn't seem like a possibility. Instead, I wonder if it would be worthwhile to iterate the blocklist in RPO during a late flow opt pass, and run fgOptimizeBranch for BBJ_ALWAYS candidates that aren't immediately succeeded by their BBJ_COND target, implying that cloning the condition would enable more fallthrough. The RPO traversal probably isn't resistant to changes in flow, but I don't think that's a problem for fgOptimizeBranch since we would only need to visit each candidate BBJ_ALWAYS once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants