JIT: Implement greedy RPO-based block layout #101473

amanasifkhalid · 2024-04-24T04:40:14Z

Part of #93020. Compiler::fgDoReversePostOrderLayout reorders blocks based on a RPO of the flowgraph's successor edges. When reordering based on the RPO, we only reorder blocks within the same EH region to avoid breaking up their contiguousness. After establishing an RPO-based layout, we do another pass to move cold blocks to the ends of their regions in fgMoveColdBlocks.

The "greedy" part of this layout isn't all that greedy just yet. For now, we use edge likelihoods to make placement decisions only for BBJ_COND blocks' successors. I plan to extend this greediness to other multi-successor block kinds (BBJ_SWITCH, etc) in a follow-up so we can independently evaluate the value in doing so.

This new layout is disabled by default for now.

dotnet-policy-service · 2024-04-24T04:40:40Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

amanasifkhalid · 2024-04-26T22:40:18Z

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs are big, though libraries_tests seems to be inflating the numbers. Improvements and regressions seem to be universally larger on x64 versus arm64 -- I assume this is because many short branches are becoming long on x64, and many long branches are becoming short, etc. Also, some of the arm64 example diffs suggest we ought to extend our emitter-level branch removal optimization beyond x64: I see some branches that we should be able to get rid of.

In my previous iterations of this PR, I wanted the block order to mirror the RPO as closely as possible, and repair try regions as necessary afterwards. I got this approach working when using funclets, but Windows x86's EH model complicated this by allowing handlers in the main method body too... I tried iterating on my initial implementation to handle the non-funclet EH case, but I was approaching two separate solutions, which didn't seem ideal -- plus all of these local fixups is something we're trying to get away from in the old layout approach. The diffs don't seem any more or less extreme by restricting reordering to within EH regions, and the code is quite a bit easier to follow.

AndyAyersMS · 2024-04-27T00:57:19Z

Seems likely that some significant part of the size improvement is from not moving cold blocks out of line (not splitting, per se, just separating within the method or region). I think we may want this part to be implemented before we start running in the lab. It shouldn't be too hard to implement something like this: starting at the end of each EH region, move sufficiently cold blocks to the end, keeping them in relative RPO order.

I'm ok taking that as a follow up, if you think it makes sense. Though if it is indeed that simple it would not complicate this PR too much, and this PR is delightfully simple compared to the code it will replace.

You may recall in the example code I referred you as a model to there are some exceptions to this rule, and perhaps some thresholds to consider -- for instance sometimes we might not want to not move very small cold blocks that have hot pred and succ blocks, if we're fairly confident in the data, assuming we'd end up with denser code by keeping the branch short even if it means having a (well predicted) taken branch on the hot path.

jakobbotsch · 2024-04-27T09:08:51Z

Can you explain a bit further why you want the regular successor DFS? It saddens me a bit to see it return after the work I did last year to move everything to the faithful DFS. When I did that work I didn't find any justified uses of the old DFS.

For this case, why can't we run run the faithful DFS (say with your modifications to change the visit order) and then partition the RPO into multiple ones by EH region? I don't quite understand why we need to give up on reordering of all EH regions (or regions in the main body that are dominated by an EH region).

src/coreclr/jit/compiler.hpp

amanasifkhalid · 2024-04-27T23:14:21Z

Can you explain a bit further why you want the regular successor DFS?

My initial intent was to do a DFS of all successors without worrying about EH regions at all, and then evaluate how involved the needed EH fixups would be. Allowing the funclets to be broken up by the RPO complicated the post-ordering adjustments, as we'd have to take care not to make handler regions non-contiguous while fixing try regions, and vice versa. It seemed easier and not too consequential in terms of perf to just skip EH successors altogether, though the lack of funclets on Windows x86 made this problem unavoidable -- my attempts to appease win-x86 EH semantics were a distraction from the fact that I could just reorder within EH regions and not break anything up... I've pushed an implementation to try that out using a DFS of all successors.

Seems likely that some significant part of the size improvement is from not moving cold blocks out of line (not splitting, per se, just separating within the method or region). I think we may want this part to be implemented before we start running in the lab.

I'll give that a try in my next push. I'll wait for the latest CI run to finish so we can see the updated diffs.

amanasifkhalid · 2024-04-30T03:59:14Z

@AndyAyersMS I tried renumbering the blocks after finalizing the layout locally, and the diffs are nontrivial, though they don't lean overwhelmingly in either direction. Here they are on win-x64:

Overall (-123,813 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	8,641,509	-95	+0.03%
benchmarks.run_pgo.windows.x64.checked.mch	33,740,443	-26,237	-0.17%
benchmarks.run_tiered.windows.x64.checked.mch	12,234,066	-2,303	+0.07%
coreclr_tests.run.windows.x64.checked.mch	402,587,216	+41,538	+0.06%
libraries.crossgen2.windows.x64.checked.mch	45,403,126	-353	+0.06%
libraries.pmi.windows.x64.checked.mch	62,248,989	-17,288	+0.14%
libraries_tests.run.windows.x64.Release.mch	280,210,097	-104,800	-0.08%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	136,799,480	-11,192	-0.03%
realworld.run.windows.x64.checked.mch	13,554,143	-3,905	+0.08%
smoke_tests.nativeaot.windows.x64.checked.mch	5,016,576	+822	+0.38%

FullOpts (-123,813 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch	8,641,148	-95	+0.03%
benchmarks.run_pgo.windows.x64.checked.mch	20,302,793	-26,237	-0.17%
benchmarks.run_tiered.windows.x64.checked.mch	3,063,938	-2,303	+0.07%
coreclr_tests.run.windows.x64.checked.mch	122,068,398	+41,538	+0.06%
libraries.crossgen2.windows.x64.checked.mch	45,401,740	-353	+0.06%
libraries.pmi.windows.x64.checked.mch	62,135,488	-17,288	+0.14%
libraries_tests.run.windows.x64.Release.mch	105,590,714	-104,800	-0.08%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	126,496,507	-11,192	-0.03%
realworld.run.windows.x64.checked.mch	13,148,501	-3,905	+0.08%
smoke_tests.nativeaot.windows.x64.checked.mch	5,015,585	+822	+0.38%

Details

Size improvements/regressions per collection

Collection	Contexts with diffs	Improvements	Regressions	Same size	Improvements (bytes)	Regressions (bytes)
benchmarks.run.windows.x64.checked.mch	8,277	2,678	2,051	3,548	-19,743	+19,648
benchmarks.run_pgo.windows.x64.checked.mch	14,470	6,546	3,062	4,862	-48,223	+21,986
benchmarks.run_tiered.windows.x64.checked.mch	3,989	1,164	725	2,100	-7,177	+4,874
coreclr_tests.run.windows.x64.checked.mch	50,530	15,285	25,649	9,596	-101,366	+142,904
libraries.crossgen2.windows.x64.checked.mch	32,542	8,501	8,938	15,103	-55,666	+55,313
libraries.pmi.windows.x64.checked.mch	47,037	15,221	13,968	17,848	-115,545	+98,257
libraries_tests.run.windows.x64.Release.mch	64,836	23,951	16,870	24,015	-264,568	+159,768
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	66,421	19,152	22,477	24,792	-153,112	+141,920
realworld.run.windows.x64.checked.mch	9,577	3,105	3,240	3,232	-31,797	+27,892
smoke_tests.nativeaot.windows.x64.checked.mch	5,544	1,829	1,816	1,899	-12,544	+13,366
	303,223	97,432	98,796	106,995	-809,741	+685,928

PerfScore improvements/regressions per collection

Collection	Contexts with diffs	Improvements	Regressions	Same PerfScore	Improvements (PerfScore)	Regressions (PerfScore)	PerfScore Overall in FullOpts
benchmarks.run.windows.x64.checked.mch	8,277	1,563	1,646	5,068	-1.61%	+1.69%	+0.0082%
benchmarks.run_pgo.windows.x64.checked.mch	14,470	3,354	1,841	9,275	-1.37%	+1.20%	-0.0518%
benchmarks.run_tiered.windows.x64.checked.mch	3,989	577	493	2,919	-1.69%	+2.56%	+0.0163%
coreclr_tests.run.windows.x64.checked.mch	50,530	5,662	23,029	21,839	-1.68%	+0.54%	+0.0121%
libraries.crossgen2.windows.x64.checked.mch	32,542	6,305	7,634	18,603	-1.91%	+1.85%	+0.0066%
libraries.pmi.windows.x64.checked.mch	47,037	11,118	12,760	23,159	-1.74%	+2.09%	+0.0216%
libraries_tests.run.windows.x64.Release.mch	64,836	15,474	10,501	38,861	-1.34%	+1.51%	-0.0262%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	66,421	17,299	15,269	33,853	-1.21%	+1.26%	-0.0067%
realworld.run.windows.x64.checked.mch	9,577	2,293	2,749	4,535	-1.37%	+1.45%	+0.0236%
smoke_tests.nativeaot.windows.x64.checked.mch	5,544	1,217	1,614	2,713	-2.76%	+3.48%	+0.0664%

Context information

Collection	Diffed contexts	MinOpts	FullOpts	Missed, base	Missed, diff
benchmarks.run.windows.x64.checked.mch	27,811	4	27,807	2 (0.01%)	2 (0.01%)
benchmarks.run_pgo.windows.x64.checked.mch	94,917	48,136	46,781	14 (0.01%)	14 (0.01%)
benchmarks.run_tiered.windows.x64.checked.mch	53,445	37,214	16,231	9 (0.02%)	9 (0.02%)
coreclr_tests.run.windows.x64.checked.mch	583,206	347,435	235,771	50 (0.01%)	50 (0.01%)
libraries.crossgen2.windows.x64.checked.mch	277,538	15	277,523	31 (0.01%)	31 (0.01%)
libraries.pmi.windows.x64.checked.mch	313,896	6	313,890	156 (0.05%)	156 (0.05%)
libraries_tests.run.windows.x64.Release.mch	677,126	482,477	194,649	66 (0.01%)	66 (0.01%)
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	320,901	21,949	298,952	31 (0.01%)	31 (0.01%)
realworld.run.windows.x64.checked.mch	34,064	3	34,061	0 (0.00%)	0 (0.00%)
smoke_tests.nativeaot.windows.x64.checked.mch	31,814	12	31,802	18 (0.06%)	18 (0.06%)
	2,414,718	937,251	1,477,467	377 (0.02%)	377 (0.02%)

amanasifkhalid · 2024-04-30T15:00:06Z

Also, the latest SPMI run shows the ratio of size improvements to regressions decreasing, though we're missing some collections.

CI failures are #101721 and #101685.

AndyAyersMS

Overall I still really like the shape of this -- a bit concerned with the complexity of maintaining the EH invariants, but we can live with it.

Left you a few comments to consider.

AndyAyersMS · 2024-04-30T15:05:00Z

src/coreclr/jit/compiler.hpp

+            {
+                RETURN_ON_ABORT(func(GetFalseTarget()));
+            }
+            else if (useProfile && (GetTrueEdge()->getLikelihood() < GetFalseEdge()->getLikelihood()))


Am I confused, or is this visiting the less likely successor first?

Suggested change

else if (useProfile && (GetTrueEdge()->getLikelihood() < GetFalseEdge()->getLikelihood()))

else if (useProfile && (GetTrueEdge()->getLikelihood() > GetFalseEdge()->getLikelihood()))

It looks unintuitive, but I think we need to flip the comparison so the DFS is in the order we want. Consider the following block list, pre-ordering:

BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags] --------------------------------------------------------------------------------------------------------------------------------------------------------------------- BB01 [0014] 1 1 [???..???)-> BB02(1) (always) i keep internal BB02 [0000] 1 BB01 1 100 [000..00E)-> BB04(0),BB03(1) ( cond ) i IBC BB03 [0010] 1 BB02 0.50 50 [00D..00E)-> BB05(1) (always) i IBC nullcheck BB04 [0011] 1 BB02 0 0 [00D..00E)-> BB05(1) (always) i IBC rare BB05 [0012] 2 BB03,BB04 1 [00D..017) (return) i hascall gcsafe

When processing BB02, if we visit BB03 before BB04, then we end up with an RPO that looks something like [<BB03's successors>, BB03, BB04, BB02, ...], so after layout, we get this:

BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags] --------------------------------------------------------------------------------------------------------------------------------------------------------------------- BB01 [0014] 1 1 [???..???)-> BB02(1) (always) i keep internal BB02 [0000] 1 BB01 1 100 [000..00E)-> BB04(0),BB03(1) ( cond ) i IBC BB04 [0011] 1 BB02 0 0 [00D..00E)-> BB05(1) (always) i IBC rare BB03 [0010] 1 BB02 0.50 50 [00D..00E)-> BB05(1) (always) i IBC nullcheck BB05 [0012] 2 BB03,BB04 1 [00D..017) (return) i hascall gcsafe

If we instead visit the less likely successor (BB04) first, we push the more likely successor BB03 up to BB02 in the RPO, and get this layout:

BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags] --------------------------------------------------------------------------------------------------------------------------------------------------------------------- BB01 [0014] 1 1 [???..???)-> BB02(1) (always) i keep internal BB02 [0000] 1 BB01 1 100 [000..00E)-> BB04(0),BB03(1) ( cond ) i IBC BB03 [0010] 1 BB02 0.50 50 [00D..00E)-> BB05(1) (always) i IBC nullcheck BB04 [0011] 1 BB02 0 0 [00D..00E)-> BB05(1) (always) i IBC rare BB05 [0012] 2 BB03,BB04 1 [00D..017) (return) i hascall gcsafe

Yeah, I think because we're going to form an RPO then visiting the less-likely successor first is correct. If we wanted to view the depth-first spanning tree as a pseudo maximum weight tree then we'd do it the other way around.

Can you add a comment here?

Sure thing.

This logic seems very specialized to block layout and tied into how the DFS traversal works to end up with the result it wants. It makes it seem a bit odd for it to live in this very general utility function.
I'm ok with this for now, but if we end up with even more logic to handle other cases (like BBJ_SWITCH) then I'd suggest we introduce a separate version of the visitor that lives next to the block layout code. It would save a bit on throughput as well since now everyone is paying for this useProfile check.

Agreed, I'll fix this in a follow-up PR. As for what the new abstraction should look like, would you prefer we move the useProfile check into AllSuccessorEnumerator, or even introduce a new enumerator like ProfileGuidedSuccessorEnumerator?

Perhaps introduce two instance initializer methods on AllSuccessorEnumerator and then pass some factory method to fgRunDfs? E.g. the normal use would be:

fgRunDfs([](SuccessorEnumerator* enumerator, BasicBlock* block) { enumerator->InitializeAllSuccs(block); }, ...);

and the block layout use could be

fgRunDfs([](SuccessorEnumerator* enumerator, BasicBlock* block) { enumerator->InitializeAllSuccsForBlockLayout(block); }, ...);

AndyAyersMS · 2024-04-30T15:08:33Z

src/coreclr/jit/fgopt.cpp

+            fgDoReversePostOrderLayout();
+            fgMoveColdBlocks();
+
+#ifdef DEBUG


You can just remove this whole #ifdef DEBUG bit, we will check the BB list at end of phase, which is imminent.

AndyAyersMS · 2024-04-30T15:10:32Z

src/coreclr/jit/fgopt.cpp

+            }
+#endif // DEBUG
+
+            return true;


I assume it's unlikely that we won't move blocks here? The other variant tries to keep track, but I'm not sure it is worth the extra bookkeeping. All we are doing by sometimes returning false is speeding up the checked jit a tiny bit.

That was my reasoning for hard-coding the return value.

AndyAyersMS · 2024-04-30T15:13:16Z

src/coreclr/jit/fgopt.cpp

+    }
+#endif // DEBUG
+
+    // Compute DFS of main function body's blocks, using profile data to determine the order successors are visited in


This is now a DFS of everything, right? Not just the main function?

Yes, updated.

AndyAyersMS · 2024-04-30T15:17:43Z

src/coreclr/jit/fgopt.cpp

+        return;
+    }
+
+    // The RPO will break up call-finally pairs, so save them before re-ordering


Seems plausible that this map could be built up during the pass below, since the pair tail should follow the pair head in the RPO, so when we encounter the pair head neither block should have moved yet.

Thanks for pointing this out; this seems to work locally. Just to make sure I don't have to add an edge case for this, we would never expect the RPO traversal to begin with a call-finally pair, right?

This approach only hit one failure in a SPMI replay. I'm digging into this locally, but I adjusted my approach for now to (hopefully) be less expensive. Though with the current approach, we need to default-initialize entries in the regions ArrayStack for each EH region, so if we're going to loop over every EH clause anyway, then maybe this approach is cheaper in most cases anyway?

Yeah I like this new approach of just walking the EH table.

In principle the call finally pair tail is dominated by the call finally pair head, however depending on how we do the DFS we may see "infeasible cross flow" .. say we have a finally with two callfinallies, it should not be possible to call via one callfinally and return to the other's tail, but may look feasible in the flow graph.

AndyAyersMS · 2024-04-30T15:28:22Z

src/coreclr/jit/fgopt.cpp

+}
+
+//-----------------------------------------------------------------------------
+// fgMoveColdBlocks: Move rarely-run blocks to the end of their respective regions.


Am curious why this doesn't need to update the try region ends?

Since this is finding the last block in region, and then doing insert before, it seems like it might leave one hot block at the end?

Yeah, I have the same reservations about this approach; Phoenix does something similar, though the last block in a region seems to have different semantic meaning there. I could just insert the cold blocks after the last block in the region to avoid potentially pushing a hot exit block to the end, though that might introduce back edges on the cold path -- is that ok?

Phoenix had various "end region" constructs so yes it was different.

Got it, I've adjusted my approach to insert cold blocks after the end of the region, and then move the insertion point to the end as well, but only if it is cold.

AndyAyersMS · 2024-04-30T15:31:16Z

src/coreclr/jit/fgopt.cpp

+            return false;
+        }
+
+        // Don't move any part of a call-finally pair -- these need to stay together


Seems like we could handle these, by skipping the pair tail (like you do now) and then moving both if we want to move the pair head?

AndyAyersMS · 2024-04-30T15:33:39Z

src/coreclr/jit/fgopt.cpp

+        // Instead, set the end of the region to the BBJ_CALLFINALLY block in the pair.
+        // Also, don't consider blocks in handler regions.
+        // (If all of some try region X is enclosed in an exception handler,
+        // lastColdTryBlocks[X] will be null. We will handle this case later.)


Maybe be a bit more specific?

Suggested change

// lastColdTryBlocks[X] will be null. We will handle this case later.)

// lastColdTryBlocks[X] will be null. We will check for this case in tryMovingBlocks below.)

AndyAyersMS · 2024-04-30T15:36:33Z

src/coreclr/jit/fgopt.cpp

+    // (for example, by pushing throw blocks unreachable via normal flow to the end of the region).
+    // First, determine the new EH region ends.
+    //
+    BasicBlock** const tryRegionEnds       = new (this, CMK_Generic) BasicBlock* [compHndBBtabCount] {};


Consider using ArrayStack<struct> for this data (where struct contains your 3 data items). It will usually be able to avoid heap allocation.

amanasifkhalid · 2024-05-01T20:17:58Z

@AndyAyersMS thank you for the review! Diffs are more conservative now that we're moving cold blocks, though still pretty big.

Since we have TP to spare, do you think we should renumber blocks post-layout in this PR? Or maybe in a follow-up?

AndyAyersMS · 2024-05-01T20:36:19Z

Since we have TP to spare, do you think we should renumber blocks post-layout in this PR?

If we're going to do it eventually, we might as well do it now, one less churn event to sort through.

AndyAyersMS

Overall this looks good to me.

It's a bit sad that we can't treat the "not in try" region symmetrically with the try regions, but I think having a bit of duplicated logic is ok and we can sort this out later if we think it matters.

I would give @jakobbotsch a chance to sign off too.

AndyAyersMS · 2024-05-01T20:44:12Z

src/coreclr/jit/jitconfigvalues.h

@@ -754,6 +754,9 @@ RELEASE_CONFIG_INTEGER(JitEnablePhysicalPromotion, W("JitEnablePhysicalPromotion
 // Enable cross-block local assertion prop
 RELEASE_CONFIG_INTEGER(JitEnableCrossBlockLocalAssertionProp, W("JitEnableCrossBlockLocalAssertionProp"), 1)

+// Do greedy RPO-based layout in Compiler::fgReorderBlocks.
+RELEASE_CONFIG_INTEGER(JitDoReversePostOrderLayout, W("JitDoReversePostOrderLayout"), 1);


Make sure to remember to undo this before merging.

Got it. I'm gonna do one more CI run with block renumbering to compare the diffs. I'll mark as no-merge for now to remind myself.

AndyAyersMS · 2024-05-01T20:47:17Z

src/coreclr/jit/fgopt.cpp

+        return;
+    }
+
+    // The RPO will break up call-finally pairs, so save them before re-ordering


Yeah I like this new approach of just walking the EH table.

In principle the call finally pair tail is dominated by the call finally pair head, however depending on how we do the DFS we may see "infeasible cross flow" .. say we have a finally with two callfinallies, it should not be possible to call via one callfinally and return to the other's tail, but may look feasible in the flow graph.

amanasifkhalid · 2024-05-02T14:59:07Z

Here are the diffs with block renumbering. Size improvements overall are slightly bigger, and TP is slightly improved, too.

jakobbotsch · 2024-05-02T17:46:54Z

src/coreclr/jit/fgopt.cpp

+
+    // Fix up call-finally pairs
+    //
+    for (BlockToBlockMap::Node* const iter : BlockToBlockMap::KeyValueIteration(&callFinallyPairs))


Doesn't seem like we actually use callFinallyPairs as a map, so it could presumably just be an ArrayStack of pairs.

jakobbotsch · 2024-05-02T17:50:10Z

src/coreclr/jit/fgopt.cpp

+    {
+        if (block->hasTryIndex())
+        {
+            EHLayoutInfo& layoutInfo       = regions.TopRef(block->getTryIndex());


Nit: I'd suggest indexing from the bottom instead, so that things end up stored in order instead of in reverse in this array.

jakobbotsch

LGTM. Some minor feedback that you can feel free to address as you see fit.

amanasifkhalid · 2024-05-02T18:24:33Z

@jakobbotsch @AndyAyersMS Thank you both for the reviews! No diffs now that the new layout is disabled by default.

amanasifkhalid · 2024-05-02T21:01:22Z

Failures are #101721.

Part of dotnet#93020. Compiler::fgDoReversePostOrderLayout reorders blocks based on a RPO of the flowgraph's successor edges. When reordering based on the RPO, we only reorder blocks within the same EH region to avoid breaking up their contiguousness. After establishing an RPO-based layout, we do another pass to move cold blocks to the ends of their regions in fgMoveColdBlocks. The "greedy" part of this layout isn't all that greedy just yet. For now, we use edge likelihoods to make placement decisions only for BBJ_COND blocks' successors. I plan to extend this greediness to other multi-successor block kinds (BBJ_SWITCH, etc) in a follow-up so we can independently evaluate the value in doing so. This new layout is disabled by default for now.

amanasifkhalid added 4 commits April 21, 2024 19:25

WIP: Start greedy RPO layout

f58556d

Implement RPO layout for methods with EH

de7a43a

Cleanup

17fbe7c

Style

1084d7e

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 24, 2024

dotnet-policy-service bot assigned amanasifkhalid Apr 24, 2024

amanasifkhalid added 3 commits April 26, 2024 15:51

Only reorder blocks within same EH region

d1bf04a

Merge branch 'main' into greedy-rpo

662258e

Style

98b4bd9

amanasifkhalid mentioned this pull request Apr 26, 2024

JIT: Flow Graph Modernization and Improved Block Layout #93020

Closed

51 tasks

amanasifkhalid marked this pull request as ready for review April 26, 2024 22:32

jakobbotsch reviewed Apr 27, 2024

View reviewed changes

src/coreclr/jit/compiler.hpp Outdated Show resolved Hide resolved

amanasifkhalid added 3 commits April 27, 2024 18:53

Use full RPO

587b5bf

Style

23c6840

Fix comment

ffdf8ad

Add fgMoveColdBlocks

1d267cd

AndyAyersMS reviewed Apr 30, 2024

View reviewed changes

build-analysis bot mentioned this pull request Apr 30, 2024

System.Numerics.Tensors.Tests.SingleGenericTensorPrimitives.SpanScalarDestination_SpecialValues fails #101721

Closed

amanasifkhalid added 3 commits April 30, 2024 14:46

fgDoReversePostOrderLayout feedback

9b08c4e

fgMoveColdBlocks feedback

9dfe4dc

Reduce code dup

04cecca

AndyAyersMS approved these changes May 1, 2024

View reviewed changes

amanasifkhalid added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label May 2, 2024

Renumber blocks

fbea160

amanasifkhalid requested a review from jakobbotsch May 2, 2024 02:45

Disable by default

2abe1a2

amanasifkhalid removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label May 2, 2024

jakobbotsch reviewed May 2, 2024

View reviewed changes

jakobbotsch approved these changes May 2, 2024

View reviewed changes

Feedback

d2f5ad6

build-analysis bot mentioned this pull request May 2, 2024

Test failure in System.Numerics.Tensors.Tests.SingleGenericTensorPrimitives.SpanDestinationFunctions_SpecialValues #101731

Closed

amanasifkhalid merged commit 28c39bc into dotnet:main May 2, 2024
94 of 106 checks passed

amanasifkhalid deleted the greedy-rpo branch May 2, 2024 21:04

This was referenced May 3, 2024

JIT: Test RPO-based block layout in runtime-jit-experimental #101851

Merged

JIT: Visit switch successors in increasing likelihood order for RPO-based layout #101935

Closed

github-actions bot locked and limited conversation to collaborators Jun 4, 2024

	else if (useProfile && (GetTrueEdge()->getLikelihood() < GetFalseEdge()->getLikelihood()))
	else if (useProfile && (GetTrueEdge()->getLikelihood() > GetFalseEdge()->getLikelihood()))

	// lastColdTryBlocks[X] will be null. We will handle this case later.)
	// lastColdTryBlocks[X] will be null. We will check for this case in tryMovingBlocks below.)

JIT: Implement greedy RPO-based block layout #101473

JIT: Implement greedy RPO-based block layout #101473

Conversation

amanasifkhalid commented Apr 24, 2024 • edited Loading

dotnet-policy-service bot commented Apr 24, 2024

amanasifkhalid commented Apr 26, 2024

AndyAyersMS commented Apr 27, 2024 • edited Loading

jakobbotsch commented Apr 27, 2024 • edited Loading

amanasifkhalid commented Apr 27, 2024

amanasifkhalid commented Apr 30, 2024

Size improvements/regressions per collection

PerfScore improvements/regressions per collection

Context information

amanasifkhalid commented Apr 30, 2024 • edited Loading

AndyAyersMS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakobbotsch May 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amanasifkhalid May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amanasifkhalid commented May 1, 2024

AndyAyersMS commented May 1, 2024

AndyAyersMS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amanasifkhalid commented May 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakobbotsch left a comment

Choose a reason for hiding this comment

amanasifkhalid commented May 2, 2024

amanasifkhalid commented May 2, 2024

amanasifkhalid commented Apr 24, 2024 •

edited

Loading

AndyAyersMS commented Apr 27, 2024 •

edited

Loading

jakobbotsch commented Apr 27, 2024 •

edited

Loading

amanasifkhalid commented Apr 30, 2024 •

edited

Loading

jakobbotsch May 3, 2024 •

edited

Loading

amanasifkhalid May 1, 2024 •

edited

Loading