Add Launch Bounds #144

AD2605 · 2024-10-18T14:56:29Z

Small PR to add Launch bounds to the cuda launch.
Also force inlines the device_kernel which leads to no call instructions.

This increases the performance in the benchmarks, making the SYCL path, roughly equal or better than the CUDA one

include/cutlass/device_kernel.h

aacostadiaz · 2024-10-21T16:40:54Z

cmake/FindDPCPP.cmake

@@ -53,6 +53,7 @@ if(NOT "${DPCPP_SYCL_ARCH}" STREQUAL "")
  if("${DPCPP_SYCL_TARGET}" STREQUAL "nvptx64-nvidia-cuda")
    list(APPEND DPCPP_FLAGS "-Xsycl-target-backend")
    list(APPEND DPCPP_FLAGS "--cuda-gpu-arch=${DPCPP_SYCL_ARCH}")
+    list(APPEND DPCPP_FLAGS "-fgpu-inline-threshold=1000000;")


If this still needed if we use __attribute__((always_inline)) inline?

I remember that back in portFFT, we had to increase the inline threshold, in combination with always_inline,
hence I went ahead with this, and have not tested it individually.
I can try without it, and will remove it if not required

Ran without the inline-threshold...
it apparently is required

AD2605 added 3 commits October 18, 2024 15:55

add launch bounds to cuda launch

70d38de

fix compilation

0ba3433

extend inline threshold in the Nvidia Path

70264d4

aacostadiaz reviewed Oct 21, 2024

View reviewed changes

include/cutlass/device_kernel.h Outdated Show resolved Hide resolved

aacostadiaz reviewed Oct 21, 2024

View reviewed changes

AD2605 and others added 2 commits October 22, 2024 16:34

add attribute((always_inline)) to CUTLASS_GLOBAL

804061b

Merge branch 'sycl-develop' into atharva/add_launch_bounds

e1f85c4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Launch Bounds #144

Add Launch Bounds #144

AD2605 commented Oct 18, 2024 •

edited

Loading

aacostadiaz Oct 21, 2024

AD2605 Oct 22, 2024

AD2605 Oct 22, 2024

Add Launch Bounds #144

Are you sure you want to change the base?

Add Launch Bounds #144

Conversation

AD2605 commented Oct 18, 2024 • edited Loading

aacostadiaz Oct 21, 2024

Choose a reason for hiding this comment

AD2605 Oct 22, 2024

Choose a reason for hiding this comment

AD2605 Oct 22, 2024

Choose a reason for hiding this comment

AD2605 commented Oct 18, 2024 •

edited

Loading