Experiment with a slighly adjusted pipeline #52850

gbaraldi · 2024-01-10T18:22:56Z

and add GC final lowering verification.

gbaraldi · 2024-01-11T16:38:29Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2024-01-12T00:21:14Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

src/llvm-final-gc-lowering.cpp

src/llvm-late-gc-lowering.cpp

gbaraldi · 2024-01-12T17:59:45Z

@nanosoldier runbenchmarks(!"scalar", vs=":master")

nanosoldier · 2024-01-13T00:04:04Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

gbaraldi · 2024-01-31T14:58:16Z

@nanosoldier runbenchmarks(!"scalar", vs=":master")

nanosoldier · 2024-01-31T21:06:48Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

oscardssmith · 2024-10-16T17:27:57Z

src/pipeline.cpp

+            FPM.addPass(InstCombinePass());
+            FPM.addPass(AggressiveInstCombinePass());


does it make sense to do 2 instcombine right next to each other?

Yep, Clang does exactly this https://github.com/llvm/llvm-project/blob/ae778ae7ce72219270c30d5c8b3d88c9a4803f81/llvm/lib/Passes/PassBuilderPipelines.cpp#L585-L586

It might be worth customizing the AggressiveInstCombinePass slightly since the defaults include some options that are likely not useful for us specifically from https://llvm.org/doxygen/AggressiveInstCombine_8cpp.html:

foldSqrt is probalby useless because we generate LLVM sqrt

tryToRecognizePopCount probably isn't useful since we have count_ones

foldMemChr I don't think we use memchr (but not sure).

This is unlikely to matter much, but probably could save a bit of compile time here and there.

vtjnash · 2024-10-16T19:01:52Z

looks like you need to fix a couple tests:

Failed Tests (2):
2024-10-16 14:39:21 EDT	  Julia :: image-codegen.jl
2024-10-16 14:39:21 EDT	  Julia :: pipeline-prints.ll

also rerunning nanosoldier, since a lot of changes have happened since:
@nanosoldier runbenchmarks(!"scalar", vs=":master")

nanosoldier · 2024-10-17T02:01:00Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

oscardssmith · 2024-10-17T03:45:07Z

Looks overall pretty good, but there are a couple 10x regressions (look like vectorization failures). Is there an easy way from nanosoldier for us to test compile time to make sure it's comparable?

Zentrik · 2024-10-17T06:39:59Z

Isn't that what the inference benchmarks are for, which look like no change to me.

gbaraldi · 2024-10-17T15:11:19Z

I took a big look at it. There's still a couple regressions, but it seems to be a pretty clear overall win. If anyone wants to take a further look

["union", "array", ("perf_countequals", "Int8")]
["array", "index", ("sumelt_boundscheck", "Base.ReinterpretArray{BaseBenchmarks.ArrayBenchmarks.PairVals{Int32}, 2, Int64, Matrix{Int64}, false}")] We are failing to elide a boundscheck
The simd conditional loop ones (they are very noisy (per run and per machine)

The 16x regression is now gone with my latest commit

…Also add tests and fix llvmpasses

gbaraldi · 2024-10-21T02:16:02Z

Do we want to run a pkgeval? Im slightly worried about the fact that I had to modify passes.

vchuravy · 2024-10-21T07:02:11Z

src/llvm-final-gc-lowering.cpp

+#ifdef JL_VERIFY_PASSES
+    for (auto &BB : F) {
+        for (auto &I : make_early_inc_range(BB)) {
+            auto *CI = dyn_cast<CallInst>(&I);
+            if (!CI)
+                continue;
+
+        Value *callee = CI->getCalledOperand();
+        assert(callee);
+        auto IS_INTRINSIC = [&](auto intrinsic) {
+            auto intrinsic2 = getOrNull(intrinsic);
+            if (intrinsic2 == callee) {
+                errs() << "Final-GC-lowering didn't eliminate all intrinsics'" << F.getName() << "', dumping entire module!\n\n";
+                errs() << *F.getParent() << "\n";
+                abort();
+            }
+        };
+        IS_INTRINSIC(jl_intrinsics::newGCFrame);
+        IS_INTRINSIC(jl_intrinsics::pushGCFrame);
+        IS_INTRINSIC(jl_intrinsics::popGCFrame);
+        IS_INTRINSIC(jl_intrinsics::getGCFrameSlot);
+        IS_INTRINSIC(jl_intrinsics::GCAllocBytes);
+        IS_INTRINSIC(jl_intrinsics::queueGCRoot);
+        IS_INTRINSIC(jl_intrinsics::safepoint);
+        }
+    }
+#endif
+    return false;


With something like #56188 this may fail if we use llvm.compiler.used.

I'm a bit confused? Why would it fail? After this it should be an addrspacecast

vchuravy · 2024-10-21T07:02:33Z

src/llvm-late-gc-lowering.cpp

+                            }
+                        }
+                        assert(allocas.size() > 0);
+                        assert(std::all_of(allocas.begin(), allocas.end(), [&] (AllocaInst* SRetAlloca) {return (SRetAlloca->getArraySize() == allocas[0]->getArraySize() && SRetAlloca->getAllocatedType() == allocas[0]->getAllocatedType());}));


Formatting?

vchuravy · 2024-10-21T07:03:36Z

src/llvm-late-gc-lowering.cpp

+                                            if (TrueSRet && FalseSRet) {
+                                                worklist.push_back(TrueSRet);
+                                                worklist.push_back(FalseSRet);


What if TrueSRet == FalseSRet but it hasn't been eliminated yet?

I think the solution is to make gc_allocas a set instead of a smallvector. So if we end up pushing the same thing twice that's still fine

vchuravy · 2024-10-21T07:03:48Z

src/llvm-late-gc-lowering.cpp

-                                        S.ArrayAllocas[SRet_gc] = tracked.count * cast<ConstantInt>(SRet_gc->getArraySize())->getZExtValue();
+
+                                    assert(gc_allocas.size() > 0);
+                                    assert(std::all_of(gc_allocas.begin(), gc_allocas.end(), [&] (AllocaInst* SRetAlloca) {return (SRetAlloca->getArraySize() == gc_allocas[0]->getArraySize() && SRetAlloca->getAllocatedType() == gc_allocas[0]->getAllocatedType());}));


vchuravy · 2024-10-21T07:04:11Z

src/llvm-lower-handlers.cpp

+                if (auto change = dyn_cast<ConstantInt>(CI->getArgOperand(1)))
+                    Depth -= change->getLimitedValue();
+                else if (auto Phi = dyn_cast<PHINode>(CI->getArgOperand(1))) {
+                    //This should really do a dataflow analysis but assuming worst case means that we will always have enough space


Suggested change

//This should really do a dataflow analysis but assuming worst case means that we will always have enough space

// XXX: This should really do a dataflow analysis but assuming worst case means that we will always have enough space

Coould we have an IR test for this?

There is one :). I was also talking with @topolarity and @vtjnash that we should just do what that pass does at codegen time and remove it.

vchuravy · 2024-10-21T07:04:22Z

src/llvm-lower-handlers.cpp

+        if (auto change = dyn_cast<ConstantInt>(it.first->getArgOperand(1)))
+            minPops = change->getLimitedValue();
+        else if (auto Phi = dyn_cast<PHINode>(it.first->getArgOperand(1))) {
+            //This should really do a dataflow analysis but assuming worst case means that we will always have enough space


Suggested change

//This should really do a dataflow analysis but assuming worst case means that we will always have enough space

// XXX: This should really do a dataflow analysis but assuming worst case means that we will always have enough space

vchuravy · 2024-10-21T07:05:58Z

src/pipeline.cpp

+        LPM.addPass(LoopRotatePass());
+        LPM.addPass(LoopDeletionPass());
+        FPM.addPass(createFunctionToLoopPassAdaptor(
+        std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false));


Suggested change

std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false));

std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false));

vchuravy · 2024-10-21T07:08:35Z

@nanosoldier runtests(ALL, vs = ":master", configuration = (buildflags=["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"],), vs_configuration = (buildflags = ["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"],))

nanosoldier · 2024-10-21T19:20:26Z

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

gbaraldi · 2024-11-18T22:36:03Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2024-11-19T12:25:17Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

gbaraldi · 2024-11-19T14:05:18Z

["union", "array", ("map", "*", "Float64", "(false, true)")] seems to be quite a large regression in my mac 2x

gbaraldi requested review from vchuravy and vtjnash January 11, 2024 13:55

vtjnash reviewed Jan 12, 2024

View reviewed changes

src/llvm-final-gc-lowering.cpp Outdated Show resolved Hide resolved

vtjnash reviewed Jan 12, 2024

View reviewed changes

src/llvm-late-gc-lowering.cpp Show resolved Hide resolved

gbaraldi force-pushed the gb/pipeline-fun branch from ad90755 to 30ed1f0 Compare January 12, 2024 17:35

gbaraldi added 2 commits January 30, 2024 20:01

Try further tuning

1d22e85

Fix git mess

e4a27bd

gbaraldi force-pushed the gb/pipeline-fun branch from 6a92ba1 to e4a27bd Compare January 30, 2024 23:02

Fix merge issues

68f9406

gbaraldi added 3 commits April 2, 2024 12:24

Merge branch 'master' into gb/pipeline-fun

880b76b

Merge branch 'master' into gb/pipeline-fun

982042c

Lets have some more fun with this

7cdf778

oscardssmith reviewed Oct 16, 2024

View reviewed changes

Fix late gc lowering

72b4a7b

Zentrik mentioned this pull request Oct 17, 2024

Bump LLVM to v19.1.1+0 #56130

Open

Move jump threading later

bafd135

gbaraldi force-pushed the gb/pipeline-fun branch from 39746e3 to d6a2afa Compare October 18, 2024 17:47

Fix lower handlers pass to handle merging branches into leave_funcs. …

b447d87

…Also add tests and fix llvmpasses

gbaraldi force-pushed the gb/pipeline-fun branch from d6a2afa to b447d87 Compare October 18, 2024 18:01

vchuravy reviewed Oct 21, 2024

View reviewed changes

gbaraldi added 3 commits October 22, 2024 11:43

Apply suggestions from code review

007d847

Add ConstraintElimination Pass to the pipeline

ac46890

Merge branch 'master' into gb/pipeline-fun

a2eced0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment with a slighly adjusted pipeline #52850

Experiment with a slighly adjusted pipeline #52850

gbaraldi commented Jan 10, 2024

gbaraldi commented Jan 11, 2024

nanosoldier commented Jan 12, 2024

gbaraldi commented Jan 12, 2024

nanosoldier commented Jan 13, 2024

gbaraldi commented Jan 31, 2024

nanosoldier commented Jan 31, 2024

oscardssmith Oct 16, 2024

gbaraldi Oct 16, 2024 •

edited

Loading

oscardssmith Oct 16, 2024

vtjnash commented Oct 16, 2024

nanosoldier commented Oct 17, 2024

oscardssmith commented Oct 17, 2024

Zentrik commented Oct 17, 2024

gbaraldi commented Oct 17, 2024

gbaraldi commented Oct 21, 2024

vchuravy Oct 21, 2024

gbaraldi Oct 22, 2024

vchuravy Oct 21, 2024

vchuravy Oct 21, 2024

gbaraldi Oct 22, 2024

vchuravy Oct 21, 2024

vchuravy Oct 21, 2024

vchuravy Oct 21, 2024

gbaraldi Oct 22, 2024

vchuravy Oct 21, 2024

vchuravy Oct 21, 2024

vchuravy commented Oct 21, 2024

nanosoldier commented Oct 21, 2024

gbaraldi commented Nov 18, 2024

nanosoldier commented Nov 19, 2024

gbaraldi commented Nov 19, 2024

		FPM.addPass(InstCombinePass());
		FPM.addPass(AggressiveInstCombinePass());

	//This should really do a dataflow analysis but assuming worst case means that we will always have enough space
	// XXX: This should really do a dataflow analysis but assuming worst case means that we will always have enough space

	std::move(LPM), /UseMemorySSA=/false, /UseBlockFrequencyInfo=/false));
	std::move(LPM), /UseMemorySSA=/false, /UseBlockFrequencyInfo=/false));

Experiment with a slighly adjusted pipeline #52850

Are you sure you want to change the base?

Experiment with a slighly adjusted pipeline #52850

Conversation

gbaraldi commented Jan 10, 2024

gbaraldi commented Jan 11, 2024

nanosoldier commented Jan 12, 2024

gbaraldi commented Jan 12, 2024

nanosoldier commented Jan 13, 2024

gbaraldi commented Jan 31, 2024

nanosoldier commented Jan 31, 2024

Choose a reason for hiding this comment

gbaraldi Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtjnash commented Oct 16, 2024

nanosoldier commented Oct 17, 2024

oscardssmith commented Oct 17, 2024

Zentrik commented Oct 17, 2024

gbaraldi commented Oct 17, 2024

gbaraldi commented Oct 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vchuravy commented Oct 21, 2024

nanosoldier commented Oct 21, 2024

gbaraldi commented Nov 18, 2024

nanosoldier commented Nov 19, 2024

gbaraldi commented Nov 19, 2024

gbaraldi Oct 16, 2024 •

edited

Loading