-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Enabled more data types for oneMKL's gemm_batch API #8236
[SYCL] Enabled more data types for oneMKL's gemm_batch API #8236
Conversation
oneapi-src/oneMKL#466 is merged last week, shall you wait for the the next release of oneMKL? Sorry I am not familiar with oneMKL. |
@airMeng Thanks for the suggestion. At the moment there is no clear/official release process on oneMKL Interface side. We don't mention anything related to oneMKL Interface releases in the README-sycl.md as well so from a user perspective it shouldn't be confusing at the moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚢
Additional
gemm_batch
types have been enabled in oneMKL (oneapi-src/oneMKL#466) and this patch enables their corresponding APIs for the SYCL backend which eliminates the extra-steps needed when targetting NON INTEL devices to cast/copy input/output to the supported types.The enablement of
gemm_batch_impl<sycl::half, sycl::half, float, float>
for instance removes the overhead of callinggemm_batch_impl<sycl::half, sycl::half, sycl::half, sycl::half>
followed by ato_fp32_sycl
for thedst
to be copied back fromfp16
tofp32
, which directly affects the KQ + KQV multi-batch path in quantized models Prompt Processing for instance.Performance on intel GPUs remain the same, and a slight improvement in Prompt Processing performance on some Nvidia GPUs was observed (0 to 3% on average).