-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rocfft][cufft] DFT update host task to use native command #578
[rocfft][cufft] DFT update host task to use native command #578
Conversation
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@oneapi-src/onemkl-dft-write could you please review this? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Note we have used this in 2 GROMACS benchmarks to get between 2% to 7% improvements on MI210.
remove whitespace Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>
@anantsrivastava30 Is this OK to merge? Thanks |
#ifdef SYCL_EXT_ONEAPI_ENQUEUE_NATIVE_COMMAND | ||
cgh.ext_codeplay_enqueue_native_command([=](sycl::interop_handle ih){ | ||
#else | ||
cgh.host_task([=](sycl::interop_handle ih){ | ||
#endif | ||
f(std::move(ih)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it needed to duplicate this in both cuFFT and rocFFT backends and in various domains (BLAS, LAPACK, FFT)? Can't we have one wrapper used across all domains and backends?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the domains have always been separated on purpose. This makes review much easier as any change affecting common code would technically require an approval from every domain owners.
I agree this could be discussed in an issue. To my knowledge there is very little code that could be common across domains, other than the types and exceptions which are already common.
Hi @lhuot I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends. |
Then, let's fix the code duplication in the DFT domain please. |
OK sure, do you want me to add a new header e.g. "execute_helper_generic.hpp" in oneMKL/src/dft/ with the fft_host_task implementation that is portable across hip and cuda backends, and then only include this header in the rocfft and cufft backends? |
Sounds like a reasonable approach to me. Thanks! |
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
I've updated this now as requested. Tests still pass as posted in PR summary on both hip and cuda backends. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Description
Similar to #572 (see the discussions in that PR for technical details) except this covers fft backends for both amd and nvidia cases.
Update host task impl to use enqueue_native_command for DFT using the cuda/hip backends.
tests:
test_main_dft_ct_amd.txt
test_main_dft_rt_amd.txt
test_main_dft_rt_nvidia.txt
test_main_dft_ct_nvidia.txt
author: @hjabird