-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment with a slighly adjusted pipeline #52850
base: master
Are you sure you want to change the base?
Conversation
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
ad90755
to
30ed1f0
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
6a92ba1
to
e4a27bd
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
FPM.addPass(InstCombinePass()); | ||
FPM.addPass(AggressiveInstCombinePass()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to do 2 instcombine right next to each other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth customizing the AggressiveInstCombinePass
slightly since the defaults include some options that are likely not useful for us specifically from https://llvm.org/doxygen/AggressiveInstCombine_8cpp.html:
foldSqrt
is probalby useless because we generate LLVM sqrttryToRecognizePopCount
probably isn't useful since we havecount_ones
foldMemChr
I don't think we usememchr
(but not sure).
This is unlikely to matter much, but probably could save a bit of compile time here and there.
looks like you need to fix a couple tests:
also rerunning nanosoldier, since a lot of changes have happened since: |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
Looks overall pretty good, but there are a couple 10x regressions (look like vectorization failures). Is there an easy way from nanosoldier for us to test compile time to make sure it's comparable? |
Isn't that what the inference benchmarks are for, which look like no change to me. |
I took a big look at it. There's still a couple regressions, but it seems to be a pretty clear overall win. If anyone wants to take a further look
The 16x regression is now gone with my latest commit |
39746e3
to
d6a2afa
Compare
…Also add tests and fix llvmpasses
d6a2afa
to
b447d87
Compare
Do we want to run a pkgeval? Im slightly worried about the fact that I had to modify passes. |
#ifdef JL_VERIFY_PASSES | ||
for (auto &BB : F) { | ||
for (auto &I : make_early_inc_range(BB)) { | ||
auto *CI = dyn_cast<CallInst>(&I); | ||
if (!CI) | ||
continue; | ||
|
||
Value *callee = CI->getCalledOperand(); | ||
assert(callee); | ||
auto IS_INTRINSIC = [&](auto intrinsic) { | ||
auto intrinsic2 = getOrNull(intrinsic); | ||
if (intrinsic2 == callee) { | ||
errs() << "Final-GC-lowering didn't eliminate all intrinsics'" << F.getName() << "', dumping entire module!\n\n"; | ||
errs() << *F.getParent() << "\n"; | ||
abort(); | ||
} | ||
}; | ||
IS_INTRINSIC(jl_intrinsics::newGCFrame); | ||
IS_INTRINSIC(jl_intrinsics::pushGCFrame); | ||
IS_INTRINSIC(jl_intrinsics::popGCFrame); | ||
IS_INTRINSIC(jl_intrinsics::getGCFrameSlot); | ||
IS_INTRINSIC(jl_intrinsics::GCAllocBytes); | ||
IS_INTRINSIC(jl_intrinsics::queueGCRoot); | ||
IS_INTRINSIC(jl_intrinsics::safepoint); | ||
} | ||
} | ||
#endif | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With something like #56188 this may fail if we use llvm.compiler.used
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused? Why would it fail? After this it should be an addrspacecast
src/llvm-late-gc-lowering.cpp
Outdated
} | ||
} | ||
assert(allocas.size() > 0); | ||
assert(std::all_of(allocas.begin(), allocas.end(), [&] (AllocaInst* SRetAlloca) {return (SRetAlloca->getArraySize() == allocas[0]->getArraySize() && SRetAlloca->getAllocatedType() == allocas[0]->getAllocatedType());})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting?
if (TrueSRet && FalseSRet) { | ||
worklist.push_back(TrueSRet); | ||
worklist.push_back(FalseSRet); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if TrueSRet == FalseSRet
but it hasn't been eliminated yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the solution is to make gc_allocas a set instead of a smallvector. So if we end up pushing the same thing twice that's still fine
src/llvm-late-gc-lowering.cpp
Outdated
S.ArrayAllocas[SRet_gc] = tracked.count * cast<ConstantInt>(SRet_gc->getArraySize())->getZExtValue(); | ||
|
||
assert(gc_allocas.size() > 0); | ||
assert(std::all_of(gc_allocas.begin(), gc_allocas.end(), [&] (AllocaInst* SRetAlloca) {return (SRetAlloca->getArraySize() == gc_allocas[0]->getArraySize() && SRetAlloca->getAllocatedType() == gc_allocas[0]->getAllocatedType());})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting
if (auto change = dyn_cast<ConstantInt>(CI->getArgOperand(1))) | ||
Depth -= change->getLimitedValue(); | ||
else if (auto Phi = dyn_cast<PHINode>(CI->getArgOperand(1))) { | ||
//This should really do a dataflow analysis but assuming worst case means that we will always have enough space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//This should really do a dataflow analysis but assuming worst case means that we will always have enough space | |
// XXX: This should really do a dataflow analysis but assuming worst case means that we will always have enough space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coould we have an IR test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one :). I was also talking with @topolarity and @vtjnash that we should just do what that pass does at codegen time and remove it.
src/llvm-lower-handlers.cpp
Outdated
if (auto change = dyn_cast<ConstantInt>(it.first->getArgOperand(1))) | ||
minPops = change->getLimitedValue(); | ||
else if (auto Phi = dyn_cast<PHINode>(it.first->getArgOperand(1))) { | ||
//This should really do a dataflow analysis but assuming worst case means that we will always have enough space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//This should really do a dataflow analysis but assuming worst case means that we will always have enough space | |
// XXX: This should really do a dataflow analysis but assuming worst case means that we will always have enough space |
src/pipeline.cpp
Outdated
LPM.addPass(LoopRotatePass()); | ||
LPM.addPass(LoopDeletionPass()); | ||
FPM.addPass(createFunctionToLoopPassAdaptor( | ||
std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false)); | |
std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false)); |
@nanosoldier |
The package evaluation job you requested has completed - possible new issues were detected. |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
|
and add GC final lowering verification.