[XPU][OptEW] Define `-intel-triton-optimize-elementwise-parallelism` pass #2631

victor-eds · 2024-11-05T14:09:16Z

Define pass improving elementwise parallelism by avoiding layout conversions leading to data duplication between threads.

See pass documentation for more information.

…pass Define pass improving elementwise parallelism by avoiding layout conversions leading to data duplication between threads. See pass documentation for more information. Signed-off-by: victor-eds <victor.perez@codeplay.com>

victor-eds · 2024-11-05T14:09:44Z

First step for #2562.

victor-eds · 2024-11-05T15:24:42Z

test/TritonIntelGPU/optimize-elementwise.mlir

+  // CHECK:           %[[VAL_4:.*]] = tt.splat %[[VAL_2]] : f32 -> tensor<128xf32, #triton_gpu.slice<{dim = 1, parent = #[[$ATTR_4]]}>>
+  %2 = tt.splat %arg2 : f32 -> tensor<128xf32, #triton_gpu.slice<{dim = 1, parent = #blocked1}>>
+  %3 = arith.addf %0, %1 : tensor<128xf32, #triton_gpu.slice<{dim = 1, parent = #blocked1}>>
+  // CHECK:           %[[VAL_5:.*]] = arith.addf %[[VAL_4]], %[[VAL_3]] : tensor<128xf32, #triton_gpu.slice<{dim = 1, parent = #[[$ATTR_4]]}>>


The layout conversion that would result from %[[VAL_3]] conversion is erased by the pass as we can see

victor-eds · 2024-11-05T15:26:14Z

third_party/intel/include/Dialect/TritonIntelGPU/Transforms/Passes.td

+  let dependentDialects = ["mlir::triton::TritonDialect",
+                           "mlir::triton::gpu::TritonGPUDialect"];


If we find an elementwise operation from dialect X, that means the dialect has been loaded already, so no need to have it as a dependent dialect in order to create this. We want to include these two for future proofing, tho.

third_party/intel/include/Dialect/TritonIntelGPU/Transforms/Passes.td

test/TritonIntelGPU/optimize-elementwise.mlir

victor-eds · 2024-11-08T15:45:44Z

@intel/triton-codeplay-reviewers @etiotto @whitneywhtsang @chengjunlu Can I get reviews here? I already have a PR hanging on this one and will add more on Monday, so I'd like to begin getting this work merged.

victor-eds · 2024-11-08T17:09:39Z

I may change the approach to this next week. Delay reviews for now.

victor-eds added performance codegen: attention labels Nov 5, 2024

victor-eds requested review from whitneywhtsang, etiotto, chengjunlu and a team November 5, 2024 14:09

victor-eds self-assigned this Nov 5, 2024

Add chain test

40d4269

victor-eds commented Nov 5, 2024

View reviewed changes

etiotto reviewed Nov 5, 2024

View reviewed changes

third_party/intel/include/Dialect/TritonIntelGPU/Transforms/Passes.td Outdated Show resolved Hide resolved

test/TritonIntelGPU/optimize-elementwise.mlir Outdated Show resolved Hide resolved

test/TritonIntelGPU/optimize-elementwise.mlir Outdated Show resolved Hide resolved

Apply review suggestions

729d0d2

victor-eds requested a review from etiotto November 6, 2024 11:45

vlad-penkin linked an issue Nov 6, 2024 that may be closed by this pull request

Implement -tritonintelgpu-optimize-elementwise-locality #2562

Closed

Merge branch 'main' into elementwise-locality

1244679

victor-eds mentioned this pull request Nov 8, 2024

[XPU][OptEW] Add support for reshape(convert_layout) operand pattern #2658

Closed

victor-eds marked this pull request as draft November 8, 2024 17:09

victor-eds marked this pull request as ready for review November 11, 2024 09:45

Merge branch 'main' into elementwise-locality

4ca0dfd

victor-eds marked this pull request as draft November 11, 2024 09:45

Change approach

7e8bd16

victor-eds marked this pull request as ready for review November 11, 2024 13:02

victor-eds mentioned this pull request Nov 11, 2024

[XPU][OptEW] Allow multiple warps in non-sliced dimension #2670

Merged

etiotto approved these changes Nov 11, 2024

View reviewed changes

chengjunlu approved these changes Nov 12, 2024

View reviewed changes

etiotto merged commit 0d9c0d3 into intel:main Nov 12, 2024
4 checks passed

victor-eds deleted the elementwise-locality branch November 12, 2024 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU][OptEW] Define `-intel-triton-optimize-elementwise-parallelism` pass #2631

[XPU][OptEW] Define `-intel-triton-optimize-elementwise-parallelism` pass #2631

victor-eds commented Nov 5, 2024

victor-eds commented Nov 5, 2024

victor-eds Nov 5, 2024

victor-eds Nov 5, 2024

victor-eds commented Nov 8, 2024

victor-eds commented Nov 8, 2024

		let dependentDialects = ["mlir::triton::TritonDialect",
		"mlir::triton::gpu::TritonGPUDialect"];

[XPU][OptEW] Define -intel-triton-optimize-elementwise-parallelism pass #2631

[XPU][OptEW] Define -intel-triton-optimize-elementwise-parallelism pass #2631

Conversation

victor-eds commented Nov 5, 2024

victor-eds commented Nov 5, 2024

victor-eds Nov 5, 2024

Choose a reason for hiding this comment

victor-eds Nov 5, 2024

Choose a reason for hiding this comment

victor-eds commented Nov 8, 2024

victor-eds commented Nov 8, 2024

[XPU][OptEW] Define `-intel-triton-optimize-elementwise-parallelism` pass #2631

[XPU][OptEW] Define `-intel-triton-optimize-elementwise-parallelism` pass #2631