From eca61f1990fc744a76c51fa753b0a6816e6e8b87 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Thu, 7 Mar 2024 11:27:53 -0500 Subject: [PATCH] [SYCL][Doc] Clarify WI funcs in kernel compiler (#12891) Clarify the SPIR-V and OpenCL kernel compiler specifications to explain how the SYCL iteration space maps to the SPIR-V / OpenCL C languages and clarify that these kernels can use their normal language mechanisms to find the current work-item's position in the iteration space. We also disallow launching a SPIR-V or OpenCL C kernel as a simple "range" kernel. This seems consistent with our view that a "range" kernel is not just a degenerate form of an nd-range kernel. Since SPIR-V and OpenCL C kernels always have access to nd-range features, it does not make sense to launch them as range kernels. This is also consistent with our decision to limit SYCL free function kernels to "nd-range" and "single-task" forms. Some other cleanup of these specifications also: * Clarify what happens when a `local_accessor` is passed as a kernel argument to a SPIR-V or OpenCL kernel. This was causing some confusion from users. * Reformat the table in the OpenCL spec describing kernel arguments so that it has the same layout as the equivalent SPIR-V table. * Fix the name of the OpenCL header file in the example. --- ...ext_oneapi_kernel_compiler_opencl.asciidoc | 165 +++++++++++------- ..._ext_oneapi_kernel_compiler_spirv.asciidoc | 87 +++++++-- 2 files changed, 166 insertions(+), 86 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler_opencl.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler_opencl.asciidoc index 8a3699ded6969..9a021ed3a52f3 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler_opencl.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler_opencl.asciidoc @@ -21,7 +21,7 @@ == Notice [%hardbreaks] -Copyright (C) 2023-2023 Intel Corporation. All rights reserved. +Copyright (C) 2023-2024 Intel Corporation. All rights reserved. Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks of The Khronos Group Inc. @@ -54,11 +54,11 @@ This extension also depends on the following other SYCL extensions: == Status -This is an experimental extension specification, intended to provide early -access to features and gather community feedback. Interfaces defined in -this specification are implemented in DPC++, but they are not finalized -and may change incompatibly in future versions of DPC++ without prior notice. -*Shipping software products should not rely on APIs defined in +This is an experimental extension specification, intended to provide early +access to features and gather community feedback. Interfaces defined in +this specification are implemented in {dpcpp}, but they are not finalized +and may change incompatibly in future versions of {dpcpp} without prior notice. +*Shipping software products should not rely on APIs defined in this specification.* @@ -101,7 +101,8 @@ This extension adds the `opencl` enumerator to the `source_language` enumeration, which indicates that a kernel bundle defines kernels in the OpenCL C language. -``` +[source,c++] +---- namespace sycl::ext::oneapi::experimental { enum class source_language : /*unspecified*/ { @@ -110,7 +111,7 @@ enum class source_language : /*unspecified*/ { }; } // namespace sycl::ext::oneapi::experimental -``` +---- === Source code is text format @@ -278,60 +279,106 @@ functions identify a kernel using the function name, exactly as it appears in the OpenCL C source code. For example, if the kernel is defined this way in OpenCL C: -``` +[source,c++] +---- __kernel void foo(__global int *in, __global int *out) {/*...*/} -``` +---- Then the application's host code can query for the kernel like so: -``` +[source,c++] +---- sycl::kernel_bundle kb = /*...*/; sycl::kernel k = kb.ext_oneapi_get_kernel("foo"); -``` +---- === Kernel argument restrictions -When a kernel is defined in OpenCL C and invoked from SYCL via a `kernel` -object, the arguments to the kernel are restricted to certain types. -In general, the host application passes an argument value via -`handler::set_arg` using one type and the kernel receives the argument value -as a corresponding OpenCL C type. -The following table lists the set of valid types for these kernel arguments: - +The following table defines the set of OpenCL C kernel argument types that are +supported by this extension and explains how to pass each type of argument from +SYCL. [%header,cols="1,1"] |=== -|Type in SYCL host code -|Type in OpenCL C kernel +|OpenCL C type +|Corresponding SYCL type -|One of the OpenCL scalar types (e.g. `cl_int`, `cl_float`, etc.) -|The corresponding OpenCL C type (e.g. `int`, `float`, etc.) +|One of the OpenCL C scalar types (e.g. `int`, `float`, etc.) +|A {cpp} type that is device copyable, which has the same width and data + representation. -|A USM pointer. -|A `+__global+` pointer of the corresponding type. +[_Note:_ Applications typically use the corresponding OpenCL type (e.g. +`cl_int`, `cl_float`, etc.) +_{endnote}_] -|A class (or struct) that is device copyable in SYCL whose elements are - composed of OpenCL scalar types or USM pointers. -|A class (or struct) passed by value whose elements have the corresponding - OpenCL C types. +|A `+__global+` pointer. +|Either a {cpp} pointer (typically a pointer to USM memory) or an `accessor` + whose target is `target::device`. -|An `accessor` with `target::device` whose `DataT` is an OpenCL scalar type, - a USM pointer, or a device copyable class (or struct) whose elements are - composed of these types. -|A `+__global+` pointer to the first element of the accessor's buffer. - The pointer has the corresponding OpenCL C type. +|A `+__local+` pointer. +|A `local_accessor`. -[_Note:_ The accessor's size is not passed as a kernel argument, so the host -code must pass a separate argument with the size if this is desired. +[_Note:_ The `local_accessor` merely conveys the size of the local memory, such +that the kernel argument points to a local memory buffer of _N_ bytes, where +_N_ is the value returned by `local_accessor::byte_size`. +If the application wants to pass other information from the `local_accessor` to +the kernel (such as the value _N_), it must pass this as separate kernel +arguments. _{endnote}_] -|A `local_accessor` whose `DataT` is an OpenCL scalar type, a USM pointer, or a - device copyable class (or struct) whose elements are composed of these types. -|A `+__local+` pointer to the first element of the accessor's local memory. - The pointer has the corresponding OpenCL C type. +|A class (or struct) passed by value. +|A {cpp} struct or class that is device copyable, which has the same size and + data representation as the OpenCL C struct. + +[_Note:_ The SYCL argument must not contain any `accessor` or `local_accessor` +members because these types are not device copyable. +If the OpenCL C structure contains a pointer member, the corresponding SYCL +structure member is typically a USM pointer. +_{endnote}_] |=== +When data allocated on the host is accessed by the kernel via a pointer, the +application must ensure that the data has the same size and representation on +the host and inside the OpenCL C kernel. +Applications can use the OpenCL types (e.g. `cl_int`) for this purpose. + +=== Iteration space and work-item functions + +A `kernel` object created from OpenCL C source code must be launched either as +a single-task kernel or as an nd-range kernel. +Attempting to launch such a kernel with a simple range iteration space results +in undefined behavior. + +If the kernel is launched as a single-task kernel, it is executed with a +1-dimensional nd-range, with one work-group of one work-item. +Because it is launched as an nd-range kernel, the kernel can use features that +are normally prohibited in single-task kernels. +For example, the `local_accessor` type is allowed as a kernel argument, and the +kernel can use OpenCL C work-group collective functions and sub-group +functions. +Of course, these features have limited use because the kernel is launched with +just a single work-item. + +If the kernel is launched as an nd-range kernel, the number of work-group +dimensions is the same as the number of dimensions in the `nd_range`. +The global size, local size, and the number of work-groups is determined in the +usual way from the `nd_range`. +If the OpenCL C kernel is decorated with the `reqd_work_group_size` attribute, +the local size in the `nd_range` must match this value. + +The kernel may call the functions defined in section 6.15.1 "Work-Item +Functions" of the OpenCL C specification, with the following clarification. +Some of these functions take a `dimindx` parameter that selects a dimension +index. +This index has the opposite sense from SYCL, as described in section C.7.7 +"OpenCL kernel conventions and SYCL" of the core SYCL specification. +To illustrate, consider a call to `get_global_size` from a kernel that is +invoked with a 3-dimensional `nd_range`. +Calling `get_global_size(0)` retrieves the global size from dimension 2 of the +`nd_range`, and calling `get_global_size(2)` retrieves the global size from +dimension 0 of the `nd_range`. + == Examples @@ -340,9 +387,10 @@ _{endnote}_] The following example shows a simple SYCL program that defines an OpenCL C kernel as a string and then compiles and launches it. -``` +[source,c++] +---- #include -#include +#include namespace syclex = sycl::ext::oneapi::experimental; int main() { @@ -372,6 +420,7 @@ int main() { sycl::kernel k = kb_exe.ext_oneapi_get_kernel("my_kernel"); constexpr int N = 4; + constexpr int WGSIZE = 1; cl_int input[N] = {0, 1, 2, 3}; cl_int output[N] = {}; @@ -385,11 +434,12 @@ int main() { // Each argument to the kernel is a SYCL accessor. cgh.set_args(in, out); - // Invoke the kernel over a range. - cgh.parallel_for(sycl::range{N}, k); + // Invoke the kernel over an nd-range. + sycl::nd_range ndr{{N}, {WGSIZE}}; + cgh.parallel_for(ndr, k); }); } -``` +---- === Querying supported features and extensions @@ -397,7 +447,8 @@ This example demonstrates how to query the version of OpenCL C that is supported, how to query the supported features, and how to query the supported extensions. -``` +[source,c++] +---- #include #include namespace syclex = sycl::ext::oneapi::experimental; @@ -426,24 +477,4 @@ int main() { std::cout << "Device supports online compilation with the OpenCL full profile\n"; } -``` - - -== Issues - -* Do we need to document some restrictions on the OpenCL C - https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#work-item-functions[ - work-item functions] that the kernel can call, which depends on how the - kernel was launched? - For example, can a kernel launched with the simple `range` form of - `parallel_for` call `get_local_size`? - In OpenCL, there is only one way to launch kernels - (`clEnqueueNDRangeKernel`), so it is always legal to call any of the - work-item functions. - If an OpenCL kernel is launched with a NULL `local_work_size` (which is - roughly equivalent to SYCL's `range` form of `parallel_for`), the - `get_local_size` function returns the local work-group size that is chosen by - the implementation. - Level Zero, similarly, has only one way to launch kernels. - Therefore, maybe it is OK to let kernels in this extension call any of the - work-item functions, regardless of how they are launched? +---- diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler_spirv.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler_spirv.asciidoc index c39f1fd6ae37a..f539ad7dfea24 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler_spirv.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler_spirv.asciidoc @@ -10,6 +10,7 @@ :encoding: utf-8 :lang: en :dpcpp: pass:[DPC++] +:cpp: pass:[C++] :endnote: —{nbsp}end{nbsp}note // Set the default source code type in this document to C++, @@ -21,7 +22,7 @@ == Notice [%hardbreaks] -Copyright (C) 2023-2023 Intel Corporation. All rights reserved. +Copyright (C) 2023-2024 Intel Corporation. All rights reserved. Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks of The Khronos Group Inc. @@ -52,11 +53,11 @@ This extension also depends on the following other SYCL extensions: == Status -This is an experimental extension specification, intended to provide early -access to features and gather community feedback. Interfaces defined in -this specification are implemented in DPC++, but they are not finalized -and may change incompatibly in future versions of DPC++ without prior notice. -*Shipping software products should not rely on APIs defined in +This is an experimental extension specification, intended to provide early +access to features and gather community feedback. Interfaces defined in +this specification are implemented in {dpcpp}, but they are not finalized +and may change incompatibly in future versions of {dpcpp} without prior notice. +*Shipping software products should not rely on APIs defined in this specification.* @@ -103,7 +104,8 @@ This extension adds the `spirv` enumerator to the `source_language` enumeration, which indicates that a kernel bundle defines kernels as a SPIR-V binary module. -``` +[source,c++] +---- namespace sycl::ext::oneapi::experimental { enum class source_language : /*unspecified*/ { @@ -112,7 +114,7 @@ enum class source_language : /*unspecified*/ { }; } // namespace sycl::ext::oneapi::experimental -``` +---- === Source code is binary format @@ -155,16 +157,18 @@ identify a kernel using the name, exactly as it appears in the *OpEntryPoint* instruction. For example, if the kernel is defined this way in SPIR-V: -``` +[source] +---- OpEntryPoint Kernel %20 "foo" -``` +---- Then the application's host code can query for the kernel like so: -``` +[source,c++] +---- sycl::kernel_bundle kb = /*...*/; sycl::kernel k = kb.ext_oneapi_get_kernel("foo"); -``` +---- === Passing kernel arguments @@ -182,7 +186,7 @@ backend's SPIR-V client API specification. |Corresponding SYCL type |*OpTypeInt* -|A C++ type that is device copyable, which has the same width and data +|A {cpp} type that is device copyable, which has the same width and data representation. [_Note:_ Applications typically use a fixed-width integer type where the width @@ -190,7 +194,7 @@ matches the width of the *OpTypeInt*. _{endnote}_] |*OpTypeFloat* -|A C++ type that is device copyable, which has the same width and data +|A {cpp} type that is device copyable, which has the same width and data representation. [_Note:_ Applications typically use `float` when the *OpTypeFloat* has a width @@ -205,9 +209,17 @@ _{endnote}_] |*OpTypePointer* with _Storage Class_ *Workgroup* |A `local_accessor`. +[_Note:_ The `local_accessor` merely conveys the size of the local memory, such +that the *OpTypePointer* kernel argument points to a local memory buffer of _N_ +bytes, where _N_ is the value returned by `local_accessor::byte_size`. +If the application wants to pass other information from the `local_accessor` to +the kernel (such as the value _N_), it must pass this as separate kernel +arguments. +_{endnote}_] + |*OpTypePointer* with _Storage Class_ *Function* and _Type_ *OpTypeStruct* (i.e. the pointed-at type is *OpTypeStruct*). -|A C++ struct or class that is device copyable, which has the same size and +|A {cpp} struct or class that is device copyable, which has the same size and data representation as the *OpTypeStruct*. [_Note:_ The SYCL argument is a structure even though the SPIR-V argument type @@ -230,13 +242,48 @@ because this helps ensure that the size of the integers on the host matches the size in the kernel. _{endnote}_] +=== Iteration space and built-in variables + +A `kernel` object created from a SPIR-V module must be launched either as a +single-task kernel or as an nd-range kernel. +Attempting to launch such a kernel with a simple range iteration space results +in undefined behavior. + +If the kernel is launched as a single-task kernel, it is executed with a +1-dimensional nd-range, with one work-group of one work-item. +Because it is launched as an nd-range kernel, the kernel can use features that +are normally prohibited in single-task kernels. +For example, the `local_accessor` type is allowed as a kernel argument, and the +kernel can use SPIR-V group and subgroup instructions. +Of course, these features have limited use because the kernel is launched with +just a single work-item. + +If the kernel is launched as an nd-range kernel, the number of work-group +dimensions is the same as the number of dimensions in the `nd_range`. +The global size, local size, and the number of work-groups is determined in the +usual way from the `nd_range`. +If the SPIR-V kernel defines a fixed local size (e.g. via the *LocalSize* +execution mode), the local size in the `nd_range` must match this value. + +The kernel may use SPIR-V built-in variables as they are defined in the SPIR-V +client API specification, with the following clarification. +Some of the built-in variables are defined as a 3-component vector (e.g. +*GlobalSize*). +The order of these components is inverted from their corresponding SYCL +objects. +To illustrate, consider a 3-dimensional kernel invocation. +Vector component 0 of the *GlobalSize* variable corresponds to dimension 2 of +the `nd_range`, and vector component 2 of the *GlobalSize* variable corresponds +to dimension 0 of the `nd_range`. + == Example The following example shows a simple SYCL program that loads a SPIR-V module from a file and then launches a kernel from that module. -``` +[source,c++] +---- #include #include #include @@ -271,6 +318,7 @@ int main() { sycl::kernel k = kb_exe.ext_oneapi_get_kernel("my_kernel"); constexpr int N = 4; + constexpr int WGSIZE = 1; int32_t input[N] = {0, 1, 2, 3}; int32_t output[N] = {}; @@ -284,8 +332,9 @@ int main() { // Set the values for the kernel arguments. cgh.set_args(in, out); - // Invoke the kernel over a range. - cgh.parallel_for(sycl::range{N}, k); + // Invoke the kernel over an nd-range. + sycl::nd_range ndr{{N}, {WGSIZE}}; + cgh.parallel_for(ndr, k); }); } -``` +----