Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lapack][blas][cuda] Update host task impl to use enqueue_native_command #572

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

JackAKirk
Copy link
Contributor

@JackAKirk JackAKirk commented Sep 19, 2024

Description

Update host task impl to use enqueue_native_command for blas/lapack using the cuda backend (cublas/cusolver). I did both backends in a single PR because the cusolver backend uses the cublas backend of oneMKL.

The sycl_ext_codeplay_enqueue_native_command extension reduces latency wrt the host_task for native library submissions, and allows integration with sycl task_graph / events. See https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc
for details.

This extension has already been shown to lead to considerable performance improvements for applications that call oneMKL, such as Gromacs for the oneMKL fft backend. We expect similar improvements for the lapack and blas backends implemented here.

I had to update the lapack tests because they previously relied on the synchronous behaviour of the native calls due to the fact we had to sync the native streams, since previously with standard host_task we are not able to integrate the native event into the sycl task_graph/ sycl::event.
I did not need to update the blas tests since they already take into account asynchronous behaviour.

Checklist

All Submissions

  • Do all unit tests pass locally? Attach a log.

I've added a test for each backend for each of the possible codepaths:

  • SYCL_EXT_ONEAPI_ENQUEUE_NATIVE_COMMAND is defined: "..native_command"
  • SYCL_EXT_ONEAPI_ENQUEUE_NATIVE_COMMAND is not defined so use previous code path with standard host_task

test_main_blas_ct_host_task.txt
test_main_blas_rt_host_task.txt
test_main_lapack_rt_native_command.txt
test_main_lapack_ct_native_command.txt
test_main_lapack_ct_host_task.txt
test_main_lapack_rt_host_task.txt
test_main_blas_ct_res_native_command.txt
test_main_blas_rt_res_native_command.txt

See SYCL_EXT_ONEAPI_ENQUEUE_NATIVE_COMMAND for details.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
See SYCL_EXT_ONEAPI_ENQUEUE_NATIVE_COMMAND extension document for
details.

Generalize helpers funcs and use them for blas l1, l2, l3, batch

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
cublas_native_named_func

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
auto sc = CusolverScopedContextHandler(queue, ih);
f(sc);
sc.wait_stream(queue);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only change that also affects the code logic when "SYCL_EXT_ONEAPI_ENQUEUE_NATIVE_COMMAND" is not defined. I removed it because it is a duplication of the calls to the SYNC macro that happens within lapack function implementations.

Copy link
Contributor

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the PR

src/blas/backends/cublas/cublas_batch.cpp Show resolved Hide resolved
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk
Copy link
Contributor Author

@hdelan please review this when you are back.

@JackAKirk
Copy link
Contributor Author

JackAKirk commented Sep 25, 2024

I have a small patch ready to update cublas backend just a little to implement missing GEMV_BATCH.
This I think puts cublas backend to a status where everything that maps directly between oneMKL and cublas APIs is supported to some degree (some types remain unimplemented, such as bfloat16/some mixed precisions already identified in the issues board etc).
Is it OK for me to add it here, to save the PR review overhead?
@Rbiessy what do you think?

@Rbiessy
Copy link
Contributor

Rbiessy commented Sep 25, 2024

I would prefer to have a separate PR to make it clearer which commit implements what.

@JackAKirk
Copy link
Contributor Author

I would prefer to have a separate PR to make it clearer which commit implements what.

Yeah OK, I'll be patient, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants