[rocfft][cufft] DFT update host task to use native command #578

JackAKirk · 2024-10-02T17:10:22Z

Description

Similar to #572 (see the discussions in that PR for technical details) except this covers fft backends for both amd and nvidia cases.

Update host task impl to use enqueue_native_command for DFT using the cuda/hip backends.

tests:

test_main_dft_ct_amd.txt
test_main_dft_rt_amd.txt
test_main_dft_rt_nvidia.txt
test_main_dft_ct_nvidia.txt

author: @hjabird

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk · 2024-10-02T17:12:02Z

@oneapi-src/onemkl-dft-write could you please review this?

Thanks

Rbiessy

Thanks for the PR! Note we have used this in 2 GROMACS benchmarks to get between 2% to 7% improvements on MI210.

src/dft/backends/rocfft/execute_helper.hpp

remove whitespace Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>

JackAKirk · 2024-10-09T15:46:25Z

@anantsrivastava30 Is this OK to merge?

Thanks

lhuot · 2024-10-10T15:02:41Z

src/dft/backends/cufft/execute_helper.hpp

+#ifdef SYCL_EXT_ONEAPI_ENQUEUE_NATIVE_COMMAND
+    cgh.ext_codeplay_enqueue_native_command([=](sycl::interop_handle ih){
+#else
+    cgh.host_task([=](sycl::interop_handle ih){
+#endif
+        f(std::move(ih));


Why is it needed to duplicate this in both cuFFT and rocFFT backends and in various domains (BLAS, LAPACK, FFT)? Can't we have one wrapper used across all domains and backends?

My understanding is that the domains have always been separated on purpose. This makes review much easier as any change affecting common code would technically require an approval from every domain owners.
I agree this could be discussed in an issue. To my knowledge there is very little code that could be common across domains, other than the types and exceptions which are already common.

JackAKirk · 2024-10-10T15:32:00Z

Hi @lhuot

I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

lhuot · 2024-10-11T06:52:13Z

Hi @lhuot

I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

Then, let's fix the code duplication in the DFT domain please.

JackAKirk · 2024-10-11T13:48:19Z

Hi @lhuot
I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

Then, let's fix the code duplication in the DFT domain please.

OK sure, do you want me to add a new header e.g. "execute_helper_generic.hpp" in oneMKL/src/dft/ with the fft_host_task implementation that is portable across hip and cuda backends, and then only include this header in the rocfft and cufft backends?
I can't include such a generic header in include/oneapi/mkl/detail/ because the enqueue_native_command extension is not available in intel backends and trying to link this header will break intel builds.

lhuot · 2024-10-11T14:02:38Z

Hi @lhuot
I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

Then, let's fix the code duplication in the DFT domain please.

OK sure, do you want me to add a new header e.g. "execute_helper_generic.hpp" in oneMKL/src/dft/ with the fft_host_task implementation that is portable across hip and cuda backends, and then only include this header in the rocfft and cufft backends? I can't include such a generic header in include/oneapi/mkl/detail/ because the enqueue_native_command extension is not available in intel backends and trying to link this header will break intel builds.

Sounds like a reasonable approach to me. Thanks!

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

src/dft/execute_helper_generic.hpp

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk · 2024-10-11T15:42:20Z

Hi @lhuot
I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

Then, let's fix the code duplication in the DFT domain please.

OK sure, do you want me to add a new header e.g. "execute_helper_generic.hpp" in oneMKL/src/dft/ with the fft_host_task implementation that is portable across hip and cuda backends, and then only include this header in the rocfft and cufft backends? I can't include such a generic header in include/oneapi/mkl/detail/ because the enqueue_native_command extension is not available in intel backends and trying to link this header will break intel builds.

Sounds like a reasonable approach to me. Thanks!

I've updated this now as requested. Tests still pass as posted in PR summary on both hip and cuda backends.

lhuot

LGTM, thanks!

hjabird and others added 7 commits August 16, 2024 17:45

ROCm implementation

496b6f0

Remove host synchronization

8c5479b

Use AdaptiveCpp_enqueue_custom_operation in cuFFT

75dd616

Updated API (API still having minor changes)

b359285

Update to final API; Enable based on feature macro

dcef3b7

Remove unused macro

cdb496c

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Merge branch 'develop' into cufft-update-host-task

8515cdd

Rbiessy approved these changes Oct 3, 2024

View reviewed changes

src/dft/backends/rocfft/execute_helper.hpp Outdated Show resolved Hide resolved

Update src/dft/backends/rocfft/execute_helper.hpp

f15983c

remove whitespace Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>

lhuot reviewed Oct 10, 2024

View reviewed changes

switch to portable fft_enqueue_task

1df1cb1

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Rbiessy reviewed Oct 11, 2024

View reviewed changes

src/dft/execute_helper_generic.hpp Outdated Show resolved Hide resolved

JackAKirk added 2 commits October 11, 2024 07:59

Use more sensible header macro name

210dade

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Make header macro naming consistent

d76a67b

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Rbiessy approved these changes Oct 11, 2024

View reviewed changes

lhuot approved these changes Oct 14, 2024

View reviewed changes

Rbiessy merged commit 058ee95 into oneapi-src:develop Oct 14, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rocfft][cufft] DFT update host task to use native command #578

[rocfft][cufft] DFT update host task to use native command #578

JackAKirk commented Oct 2, 2024 •

edited

Loading

JackAKirk commented Oct 2, 2024

Rbiessy left a comment

JackAKirk commented Oct 9, 2024

lhuot Oct 10, 2024

Rbiessy Oct 10, 2024

JackAKirk commented Oct 10, 2024

lhuot commented Oct 11, 2024

JackAKirk commented Oct 11, 2024

lhuot commented Oct 11, 2024

JackAKirk commented Oct 11, 2024

lhuot left a comment

[rocfft][cufft] DFT update host task to use native command #578

[rocfft][cufft] DFT update host task to use native command #578

Conversation

JackAKirk commented Oct 2, 2024 • edited Loading

Description

JackAKirk commented Oct 2, 2024

Rbiessy left a comment

Choose a reason for hiding this comment

JackAKirk commented Oct 9, 2024

lhuot Oct 10, 2024

Choose a reason for hiding this comment

Rbiessy Oct 10, 2024

Choose a reason for hiding this comment

JackAKirk commented Oct 10, 2024

lhuot commented Oct 11, 2024

JackAKirk commented Oct 11, 2024

lhuot commented Oct 11, 2024

JackAKirk commented Oct 11, 2024

lhuot left a comment

Choose a reason for hiding this comment

JackAKirk commented Oct 2, 2024 •

edited

Loading