-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ESIMD] Infer address space of pointer that are passed through invoke_simd to ESIMD API to generate better code on BE #14528
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job optimizing this case! First pass comments below
@@ -75,6 +76,8 @@ ModulePass *llvm::createSYCLLowerInvokeSimdPass() { | |||
namespace { | |||
// TODO support lambda and functor overloads | |||
|
|||
ValueMap<const Value *, SmallDenseMap<uint32_t, uint32_t>> ArgAddrSpaceMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could alias the map value to ArgIdxToAddrSpaceMap
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored it out
if (Callee) { | ||
if (Callee->isDeclaration()) | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could be
if (Callee) { | |
if (Callee->isDeclaration()) | |
continue; | |
} | |
if (Callee && Callee->isDeclaration()) | |
continue; | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
ArgumentMap[i - 2] = AddressSpace; | ||
} | ||
} | ||
if (!ArgumentMap.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We can remove braces for one line ifs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -259,6 +262,49 @@ void markFunctionAsESIMD(Function *F) { | |||
} | |||
} | |||
|
|||
void AdjustAddressSpace(Function *F, uint32_t ArgNo, uint32_t ArgAddrSpace) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
void AdjustAddressSpace(Function *F, uint32_t ArgNo, uint32_t ArgAddrSpace) { | |
void adjustAddressSpace(Function *F, uint32_t ArgNo, uint32_t ArgAddrSpace) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -75,6 +76,8 @@ ModulePass *llvm::createSYCLLowerInvokeSimdPass() { | |||
namespace { | |||
// TODO support lambda and functor overloads | |||
|
|||
ValueMap<const Value *, SmallDenseMap<uint32_t, uint32_t>> ArgAddrSpaceMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything we could do to prevent this from being a global? Could we pass it down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored it out
for (const CallInst *CI : ISCalls) { | ||
SmallDenseMap<uint32_t, uint32_t> ArgumentMap; | ||
for (uint32_t i = 2; i < CI->arg_size(); ++i) { | ||
const Value *Arg = CI->getArgOperand(i); | ||
if (Arg->getType()->isPointerTy()) { | ||
uint32_t AddressSpace = Arg->getType()->getPointerAddressSpace(); | ||
if (AddressSpace == 4) { | ||
const AddrSpaceCastInst *ASC = dyn_cast<AddrSpaceCastInst>(Arg); | ||
if (!ASC) | ||
continue; | ||
|
||
AddressSpace = | ||
ASC->getOperand(0)->getType()->getPointerAddressSpace(); | ||
} | ||
ArgumentMap[i - 2] = AddressSpace; | ||
} | ||
} | ||
if (!ArgumentMap.empty()) { | ||
ArgAddrSpaceMap[CI->getArgOperand(1)] = ArgumentMap; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this to a separate function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored the logic a little bit so this part is no longer exist (its been merged to the other change to eliminate use of map
@@ -436,6 +492,28 @@ PreservedAnalyses SYCLLowerInvokeSimdPass::run(Module &M, | |||
ISCalls.insert(CI); | |||
} | |||
} | |||
for (const CallInst *CI : ISCalls) { | |||
SmallDenseMap<uint32_t, uint32_t> ArgumentMap; | |||
for (uint32_t i = 2; i < CI->arg_size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind adding a comment saying arg idx 2 is the first real arg to invoke_simd
calls? Thx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (Arg->getType()->isPointerTy()) { | ||
uint32_t AddressSpace = Arg->getType()->getPointerAddressSpace(); | ||
if (AddressSpace == 4) { | ||
const AddrSpaceCastInst *ASC = dyn_cast<AddrSpaceCastInst>(Arg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a common pattern that we lose the true addr space information in an addrspace cast to generic into the invoke_simd
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason we always convert pointers to address space 4 as the first step, but IGC is able to infer the correct address space if there any mention to the original address space. For invoke_simd we lose it between scalar and vector BEs so that is why vector BE has no way to infer the actual address space (all the callees of invoke_simd has parameters in address space 4 and looks like vector BE doesn't see what was used to call invoke_simd) Here I try to infer address space for pointer parameters for invoke_simd and so far I saw to patterns only : address space 3 pointers are passed directly while address space 1 pointers are casted to address space 4 and then passed to invoke_simd. I am not sure if this is the only pattern but it shouldn't be a problem to handle another patter if discovered
} else { | ||
for (unsigned int i = 0; i < ArgUse->getNumOperands(); ++i) { | ||
if (ArgUse->getOperand(i) == Arg) { | ||
const Type *Ty = ArgUse->getOperand(i)->getType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we only use this variable to get the context, probably we could get rid of it and get the context from somewhere else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it
sycl::free(A, q); | ||
|
||
return 0; | ||
// CHECK: addrspacecast ptr addrspace(4) %A to ptr addrspace(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add an E2E test using Jason's perf test infrastructure to lock down the lower instruction count for a case that generates the runtime checks in IGC today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test although not sure of how reliable it is going to be due to large variation in instruction count between different driver versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah lets try, if it fails all the time we can remove it. thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, only minor comments, nice refactor!
@@ -436,6 +495,7 @@ PreservedAnalyses SYCLLowerInvokeSimdPass::run(Module &M, | |||
ISCalls.insert(CI); | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove whitespace
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// | ||
// REQUIRES: gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably want to restrict to a single GPU like we did for the rest of the perf tests
// REQUIRES: gpu-intel-dg2 && level_zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
invoke_simd works everywhere. Is it related to the fact it can produce different instruction counts for different platforms ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, exactly
sycl::free(A, q); | ||
|
||
return 0; | ||
// CHECK: addrspacecast ptr addrspace(4) %A to ptr addrspace(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah lets try, if it fails all the time we can remove it. thanks
Fail not related |
@fineg74 It looks like the test is failing in poscommit, can you please take a look? https://github.com/intel/llvm/actions/runs/9978520401/job/27584493411 |
@sarnex , Do you know what is [Linux (Self build + shared libraries + no-assertions) / Build + LIT] ? |
@fineg74 We can run postcommit in precommit by making a pull request to a new branch in intel/llvm you create that includes the |
Just replied to the PR there, thanks! |
No description provided.