From 97dba84fa37ae3bdd291388c94f33f77127123b5 Mon Sep 17 00:00:00 2001 From: Dounia Khaldi Date: Fri, 20 Sep 2024 01:59:32 -0500 Subject: [PATCH] [SYCL Spec][Joint Matrix] Add a new overload for joint_matrix_apply to be able to return result into a different matrix (#13153) Currently, CUDA code that use this pattern: for (int i = 0; i < c_frag.num_elements; i++) { c_frag.x[i] = alpha * acc_frag.x[i] + beta * c_frag.x[i]; } cannot be migrated to SYCL joint matrix. This added overload addresses this limitation. --- .../sycl_ext_oneapi_matrix.asciidoc | 44 +++++++++++++++++-- 1 file changed, 41 insertions(+), 3 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc index a5568f3251ab2..fd3ae8527815a 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc @@ -401,9 +401,14 @@ of the link:sycl_ext_intel_matrix.asciidoc[sycl_ext_intel_matrix] Besides the `Group` and the `joint_matrix` arguments, `joint_matrix_apply` takes a C++ Callable object which is invoked once -for each element of the matrix. This callable object must be invocable -with a single parameter of type `T&`. Commonly, applications pass a -lambda expression. +for each element of the matrix. There are two cases: (1) one matrix is +passed, (2) two matrices are passed. + +===== Unary Operation +In this case, `joint_matrix_apply` takes one `joint_matrix` +argument. The callable object must be invocable with a single +parameter of type `T&`. Commonly, applications pass a lambda +expression. ```c++ namespace sycl::ext::oneapi::experimental::matrix { @@ -427,6 +432,39 @@ joint_matrix_apply(sg, C, [=](T &x) { }); ``` +===== Binary Operation +In this case, `joint_matrix_apply` takes two `joint_matrix` arguments: +`jm0` and `jm1` that have the same `use`, number of rows, number of +columns, and `layout`. `jm0` and `jm1` can be read-only, write-only, +or read and write arguments. The callable object must be invocable +with two parameters `x` and `y` of types `T0&` amd `T1&`, where `x` is +an element from `jm0` and `y` is an element from `jm1`. Moreover, `x` +and `y` are guaranteed to have identical coordinates in their +respective matrices. Commonly, applications pass a lambda expression. + +```c++ +namespace sycl::ext::oneapi::experimental::matrix { + +template +void joint_matrix_apply(Group g, + joint_matrix& jm0, + joint_matrix& jm1, + F&& func); + +} // namespace sycl::ext::oneapi::experimental::matrix +``` + +In the following example, every element `x` of the matrix `C` is +multiplied by `alpha`. The result is returned into the element `y` of +the matrix `D`. + +```c++ +joint_matrix_apply(sg, C, D, [=](const T &x, T &y) { + y = x * alpha; +}); +``` + ==== Prefetch ```c++