-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XPU][OptEW] Define -intel-triton-optimize-elementwise-parallelism
pass
#2631
Conversation
…pass Define pass improving elementwise parallelism by avoiding layout conversions leading to data duplication between threads. See pass documentation for more information. Signed-off-by: victor-eds <victor.perez@codeplay.com>
First step for #2562. |
// CHECK: %[[VAL_4:.*]] = tt.splat %[[VAL_2]] : f32 -> tensor<128xf32, #triton_gpu.slice<{dim = 1, parent = #[[$ATTR_4]]}>> | ||
%2 = tt.splat %arg2 : f32 -> tensor<128xf32, #triton_gpu.slice<{dim = 1, parent = #blocked1}>> | ||
%3 = arith.addf %0, %1 : tensor<128xf32, #triton_gpu.slice<{dim = 1, parent = #blocked1}>> | ||
// CHECK: %[[VAL_5:.*]] = arith.addf %[[VAL_4]], %[[VAL_3]] : tensor<128xf32, #triton_gpu.slice<{dim = 1, parent = #[[$ATTR_4]]}>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The layout conversion that would result from %[[VAL_3]]
conversion is erased by the pass as we can see
let dependentDialects = ["mlir::triton::TritonDialect", | ||
"mlir::triton::gpu::TritonGPUDialect"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we find an elementwise operation from dialect X
, that means the dialect has been loaded already, so no need to have it as a dependent dialect in order to create this. We want to include these two for future proofing, tho.
third_party/intel/include/Dialect/TritonIntelGPU/Transforms/Passes.td
Outdated
Show resolved
Hide resolved
@intel/triton-codeplay-reviewers @etiotto @whitneywhtsang @chengjunlu Can I get reviews here? I already have a PR hanging on this one and will add more on Monday, so I'd like to begin getting this work merged. |
I may change the approach to this next week. Delay reviews for now. |
Define pass improving elementwise parallelism by avoiding layout conversions leading to data duplication between threads.
See pass documentation for more information.