[SYCL] Enabled more data types for oneMKL's gemm_batch API #8236

OuadiElfarouki · 2024-07-01T12:53:07Z

Additional gemm_batch types have been enabled in oneMKL (oneapi-src/oneMKL#466) and this patch enables their corresponding APIs for the SYCL backend which eliminates the extra-steps needed when targetting NON INTEL devices to cast/copy input/output to the supported types.

The enablement of gemm_batch_impl<sycl::half, sycl::half, float, float> for instance removes the overhead of calling gemm_batch_impl<sycl::half, sycl::half, sycl::half, sycl::half> followed by a to_fp32_sycl for the dst to be copied back from fp16 to fp32, which directly affects the KQ + KQV multi-batch path in quantized models Prompt Processing for instance.

Performance on intel GPUs remain the same, and a slight improvement in Prompt Processing performance on some Nvidia GPUs was observed (0 to 3% on average).

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ggml/src/ggml-sycl/dpct/helper.hpp

airMeng · 2024-07-01T14:46:32Z

oneapi-src/oneMKL#466 is merged last week, shall you wait for the the next release of oneMKL? Sorry I am not familiar with oneMKL.

OuadiElfarouki · 2024-07-01T16:01:23Z

@airMeng Thanks for the suggestion. At the moment there is no clear/official release process on oneMKL Interface side. We don't mention anything related to oneMKL Interface releases in the README-sycl.md as well so from a user perspective it shouldn't be confusing at the moment.
We shall adopt a different approach whenver we hear from the oneMKL side regarding their release process, so will keep this in mind!

OuadiElfarouki · 2024-07-04T14:24:44Z

@airMeng @joeatodd anything else we want to address for this ?

joeatodd

LGTM 🚢

OuadiElfarouki added 2 commits July 1, 2024 12:02

Enabled more data types for oneMKL gemm_batch

5e5d898

Merge branch 'master' into mixed_types_gemm

a1189ef

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jul 1, 2024

OuadiElfarouki requested review from AidanBeltonS, airMeng and joeatodd July 1, 2024 12:54

AidanBeltonS approved these changes Jul 1, 2024

View reviewed changes

ggml/src/ggml-sycl/dpct/helper.hpp Show resolved Hide resolved

OuadiElfarouki mentioned this pull request Jul 1, 2024

Enabling extra gemm_batch type APIs for oneMKL Interface oneapi-src/SYCLomatic#2112

Closed

Merge branch 'master' into mixed_types_gemm

13deca1

airMeng approved these changes Jul 5, 2024

View reviewed changes

joeatodd approved these changes Jul 5, 2024

View reviewed changes

Merge branch 'master' into mixed_types_gemm

ab4b1a7

AidanBeltonS merged commit 1f3e1b6 into ggerganov:master Jul 5, 2024
50 of 53 checks passed

OuadiElfarouki mentioned this pull request Jul 18, 2024

[SYCL] fix multi-gpu issue on sycl #8554

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Enabled more data types for oneMKL's gemm_batch API #8236

[SYCL] Enabled more data types for oneMKL's gemm_batch API #8236

OuadiElfarouki commented Jul 1, 2024

airMeng commented Jul 1, 2024 •

edited

Loading

OuadiElfarouki commented Jul 1, 2024

OuadiElfarouki commented Jul 4, 2024

joeatodd left a comment

[SYCL] Enabled more data types for oneMKL's gemm_batch API #8236

[SYCL] Enabled more data types for oneMKL's gemm_batch API #8236

Conversation

OuadiElfarouki commented Jul 1, 2024

airMeng commented Jul 1, 2024 • edited Loading

OuadiElfarouki commented Jul 1, 2024

OuadiElfarouki commented Jul 4, 2024

joeatodd left a comment

Choose a reason for hiding this comment

airMeng commented Jul 1, 2024 •

edited

Loading