Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ESIMD] Infer address space of pointer that are passed through invoke_simd to ESIMD API to generate better code on BE #14528

Merged
merged 13 commits into from
Jul 17, 2024

Conversation

fineg74
Copy link
Contributor

@fineg74 fineg74 commented Jul 11, 2024

No description provided.

@fineg74 fineg74 marked this pull request as ready for review July 12, 2024 21:48
@fineg74 fineg74 requested a review from a team as a code owner July 12, 2024 21:48
Copy link
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job optimizing this case! First pass comments below

@@ -75,6 +76,8 @@ ModulePass *llvm::createSYCLLowerInvokeSimdPass() {
namespace {
// TODO support lambda and functor overloads

ValueMap<const Value *, SmallDenseMap<uint32_t, uint32_t>> ArgAddrSpaceMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could alias the map value to ArgIdxToAddrSpaceMap or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored it out

Comment on lines 280 to 283
if (Callee) {
if (Callee->isDeclaration())
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be

Suggested change
if (Callee) {
if (Callee->isDeclaration())
continue;
}
if (Callee && Callee->isDeclaration())
continue;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

ArgumentMap[i - 2] = AddressSpace;
}
}
if (!ArgumentMap.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We can remove braces for one line ifs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -259,6 +262,49 @@ void markFunctionAsESIMD(Function *F) {
}
}

void AdjustAddressSpace(Function *F, uint32_t ArgNo, uint32_t ArgAddrSpace) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
void AdjustAddressSpace(Function *F, uint32_t ArgNo, uint32_t ArgAddrSpace) {
void adjustAddressSpace(Function *F, uint32_t ArgNo, uint32_t ArgAddrSpace) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -75,6 +76,8 @@ ModulePass *llvm::createSYCLLowerInvokeSimdPass() {
namespace {
// TODO support lambda and functor overloads

ValueMap<const Value *, SmallDenseMap<uint32_t, uint32_t>> ArgAddrSpaceMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything we could do to prevent this from being a global? Could we pass it down?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored it out

Comment on lines 495 to 515
for (const CallInst *CI : ISCalls) {
SmallDenseMap<uint32_t, uint32_t> ArgumentMap;
for (uint32_t i = 2; i < CI->arg_size(); ++i) {
const Value *Arg = CI->getArgOperand(i);
if (Arg->getType()->isPointerTy()) {
uint32_t AddressSpace = Arg->getType()->getPointerAddressSpace();
if (AddressSpace == 4) {
const AddrSpaceCastInst *ASC = dyn_cast<AddrSpaceCastInst>(Arg);
if (!ASC)
continue;

AddressSpace =
ASC->getOperand(0)->getType()->getPointerAddressSpace();
}
ArgumentMap[i - 2] = AddressSpace;
}
}
if (!ArgumentMap.empty()) {
ArgAddrSpaceMap[CI->getArgOperand(1)] = ArgumentMap;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to a separate function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored the logic a little bit so this part is no longer exist (its been merged to the other change to eliminate use of map

@@ -436,6 +492,28 @@ PreservedAnalyses SYCLLowerInvokeSimdPass::run(Module &M,
ISCalls.insert(CI);
}
}
for (const CallInst *CI : ISCalls) {
SmallDenseMap<uint32_t, uint32_t> ArgumentMap;
for (uint32_t i = 2; i < CI->arg_size(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adding a comment saying arg idx 2 is the first real arg to invoke_simd calls? Thx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (Arg->getType()->isPointerTy()) {
uint32_t AddressSpace = Arg->getType()->getPointerAddressSpace();
if (AddressSpace == 4) {
const AddrSpaceCastInst *ASC = dyn_cast<AddrSpaceCastInst>(Arg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a common pattern that we lose the true addr space information in an addrspace cast to generic into the invoke_simd call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason we always convert pointers to address space 4 as the first step, but IGC is able to infer the correct address space if there any mention to the original address space. For invoke_simd we lose it between scalar and vector BEs so that is why vector BE has no way to infer the actual address space (all the callees of invoke_simd has parameters in address space 4 and looks like vector BE doesn't see what was used to call invoke_simd) Here I try to infer address space for pointer parameters for invoke_simd and so far I saw to patterns only : address space 3 pointers are passed directly while address space 1 pointers are casted to address space 4 and then passed to invoke_simd. I am not sure if this is the only pattern but it shouldn't be a problem to handle another patter if discovered

} else {
for (unsigned int i = 0; i < ArgUse->getNumOperands(); ++i) {
if (ArgUse->getOperand(i) == Arg) {
const Type *Ty = ArgUse->getOperand(i)->getType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we only use this variable to get the context, probably we could get rid of it and get the context from somewhere else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it

sycl::free(A, q);

return 0;
// CHECK: addrspacecast ptr addrspace(4) %A to ptr addrspace(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add an E2E test using Jason's perf test infrastructure to lock down the lower instruction count for a case that generates the runtime checks in IGC today

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test although not sure of how reliable it is going to be due to large variation in instruction count between different driver versions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah lets try, if it fails all the time we can remove it. thanks

Copy link
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, only minor comments, nice refactor!

@@ -436,6 +495,7 @@ PreservedAnalyses SYCLLowerInvokeSimdPass::run(Module &M,
ISCalls.insert(CI);
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove whitespace

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
// REQUIRES: gpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably want to restrict to a single GPU like we did for the rest of the perf tests

// REQUIRES: gpu-intel-dg2 && level_zero

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invoke_simd works everywhere. Is it related to the fact it can produce different instruction counts for different platforms ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, exactly

sycl::free(A, q);

return 0;
// CHECK: addrspacecast ptr addrspace(4) %A to ptr addrspace(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah lets try, if it fails all the time we can remove it. thanks

@sarnex
Copy link
Contributor

sarnex commented Jul 17, 2024

Fail not related

@sarnex sarnex merged commit 16e39df into intel:sycl Jul 17, 2024
13 of 14 checks passed
@fineg74 fineg74 deleted the InferAddrSpace branch July 17, 2024 17:13
@sarnex
Copy link
Contributor

sarnex commented Jul 17, 2024

@fineg74 It looks like the test is failing in poscommit, can you please take a look?

https://github.com/intel/llvm/actions/runs/9978520401/job/27584493411

@fineg74
Copy link
Contributor Author

fineg74 commented Jul 17, 2024

@fineg74 It looks like the test is failing in poscommit, can you please take a look?

https://github.com/intel/llvm/actions/runs/9978520401/job/27584493411

@sarnex , Do you know what is [Linux (Self build + shared libraries + no-assertions) / Build + LIT] ?
I am not able to reproduce it locally and it seem to pass on regular CI when I merge the latest sycl into a draft pr branch. It looks like it somehow use an old sycl-post-link or an old SYCLLowerIR library

@sarnex
Copy link
Contributor

sarnex commented Jul 17, 2024

@fineg74 It builds the compiler using the same compiler version. Maybe build the compiler and then clone the same revision again in a different dir and use the first compiler to build it?

Also there is another postcommit fail here so I am going to revert this PR for now.

sarnex added a commit that referenced this pull request Jul 17, 2024
…h invoke_simd to ESIMD API to generate better code on BE" (#14607)

Reverts #14528

Postcommit failures
@sarnex
Copy link
Contributor

sarnex commented Jul 17, 2024

@fineg74 We can run postcommit in precommit by making a pull request to a new branch in intel/llvm you create that includes the sycl-devops-pr/ string so like sycl-devops-pr/testpostcommit or something, let's try that and see if we can find the issue

@fineg74
Copy link
Contributor Author

fineg74 commented Jul 18, 2024

@sarnex the post commit PR with the fix is here: #14610 The only differences with this PR is in tests.

@sarnex
Copy link
Contributor

sarnex commented Jul 18, 2024

Just replied to the PR there, thanks!

@fineg74 fineg74 restored the InferAddrSpace branch July 18, 2024 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants