Skip to content

Commit

Permalink
[SYCL][Doc] Clarify WI funcs in kernel compiler (#12891)
Browse files Browse the repository at this point in the history
Clarify the SPIR-V and OpenCL kernel compiler specifications to explain
how the SYCL iteration space maps to the SPIR-V / OpenCL C languages
and clarify that these kernels can use their normal language mechanisms
to find the current work-item's position in the iteration space.

We also disallow launching a SPIR-V or OpenCL C kernel as a simple
"range" kernel.  This seems consistent with our view that a "range"
kernel is not just a degenerate form of an nd-range kernel.  Since
SPIR-V and OpenCL C kernels always have access to nd-range features, it
does not make sense to launch them as range kernels.  This is also
consistent with our decision to limit SYCL free function kernels to
"nd-range" and "single-task" forms.

Some other cleanup of these specifications also:

* Clarify what happens when a `local_accessor` is passed as a kernel
  argument to a SPIR-V or OpenCL kernel.  This was causing some
  confusion from users.

* Reformat the table in the OpenCL spec describing kernel arguments
  so that it has the same layout as the equivalent SPIR-V table.

* Fix the name of the OpenCL header file in the example.
  • Loading branch information
gmlueck committed Mar 7, 2024
1 parent 96d744f commit eca61f1
Show file tree
Hide file tree
Showing 2 changed files with 166 additions and 86 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
== Notice

[%hardbreaks]
Copyright (C) 2023-2023 Intel Corporation. All rights reserved.
Copyright (C) 2023-2024 Intel Corporation. All rights reserved.

Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc.
Expand Down Expand Up @@ -54,11 +54,11 @@ This extension also depends on the following other SYCL extensions:

== Status

This is an experimental extension specification, intended to provide early
access to features and gather community feedback. Interfaces defined in
this specification are implemented in DPC++, but they are not finalized
and may change incompatibly in future versions of DPC++ without prior notice.
*Shipping software products should not rely on APIs defined in
This is an experimental extension specification, intended to provide early
access to features and gather community feedback. Interfaces defined in
this specification are implemented in {dpcpp}, but they are not finalized
and may change incompatibly in future versions of {dpcpp} without prior notice.
*Shipping software products should not rely on APIs defined in
this specification.*


Expand Down Expand Up @@ -101,7 +101,8 @@ This extension adds the `opencl` enumerator to the `source_language`
enumeration, which indicates that a kernel bundle defines kernels in the
OpenCL C language.

```
[source,c++]
----
namespace sycl::ext::oneapi::experimental {
enum class source_language : /*unspecified*/ {
Expand All @@ -110,7 +111,7 @@ enum class source_language : /*unspecified*/ {
};
} // namespace sycl::ext::oneapi::experimental
```
----

=== Source code is text format

Expand Down Expand Up @@ -278,60 +279,106 @@ functions identify a kernel using the function name, exactly as it appears in
the OpenCL C source code.
For example, if the kernel is defined this way in OpenCL C:

```
[source,c++]
----
__kernel
void foo(__global int *in, __global int *out) {/*...*/}
```
----

Then the application's host code can query for the kernel like so:

```
[source,c++]
----
sycl::kernel_bundle<sycl::bundle_state::executable> kb = /*...*/;
sycl::kernel k = kb.ext_oneapi_get_kernel("foo");
```
----

=== Kernel argument restrictions

When a kernel is defined in OpenCL C and invoked from SYCL via a `kernel`
object, the arguments to the kernel are restricted to certain types.
In general, the host application passes an argument value via
`handler::set_arg` using one type and the kernel receives the argument value
as a corresponding OpenCL C type.
The following table lists the set of valid types for these kernel arguments:

The following table defines the set of OpenCL C kernel argument types that are
supported by this extension and explains how to pass each type of argument from
SYCL.

[%header,cols="1,1"]
|===
|Type in SYCL host code
|Type in OpenCL C kernel
|OpenCL C type
|Corresponding SYCL type

|One of the OpenCL scalar types (e.g. `cl_int`, `cl_float`, etc.)
|The corresponding OpenCL C type (e.g. `int`, `float`, etc.)
|One of the OpenCL C scalar types (e.g. `int`, `float`, etc.)
|A {cpp} type that is device copyable, which has the same width and data
representation.

|A USM pointer.
|A `+__global+` pointer of the corresponding type.
[_Note:_ Applications typically use the corresponding OpenCL type (e.g.
`cl_int`, `cl_float`, etc.)
_{endnote}_]

|A class (or struct) that is device copyable in SYCL whose elements are
composed of OpenCL scalar types or USM pointers.
|A class (or struct) passed by value whose elements have the corresponding
OpenCL C types.
|A `+__global+` pointer.
|Either a {cpp} pointer (typically a pointer to USM memory) or an `accessor`
whose target is `target::device`.

|An `accessor` with `target::device` whose `DataT` is an OpenCL scalar type,
a USM pointer, or a device copyable class (or struct) whose elements are
composed of these types.
|A `+__global+` pointer to the first element of the accessor's buffer.
The pointer has the corresponding OpenCL C type.
|A `+__local+` pointer.
|A `local_accessor`.

[_Note:_ The accessor's size is not passed as a kernel argument, so the host
code must pass a separate argument with the size if this is desired.
[_Note:_ The `local_accessor` merely conveys the size of the local memory, such
that the kernel argument points to a local memory buffer of _N_ bytes, where
_N_ is the value returned by `local_accessor::byte_size`.
If the application wants to pass other information from the `local_accessor` to
the kernel (such as the value _N_), it must pass this as separate kernel
arguments.
_{endnote}_]

|A `local_accessor` whose `DataT` is an OpenCL scalar type, a USM pointer, or a
device copyable class (or struct) whose elements are composed of these types.
|A `+__local+` pointer to the first element of the accessor's local memory.
The pointer has the corresponding OpenCL C type.
|A class (or struct) passed by value.
|A {cpp} struct or class that is device copyable, which has the same size and
data representation as the OpenCL C struct.

[_Note:_ The SYCL argument must not contain any `accessor` or `local_accessor`
members because these types are not device copyable.
If the OpenCL C structure contains a pointer member, the corresponding SYCL
structure member is typically a USM pointer.
_{endnote}_]
|===

When data allocated on the host is accessed by the kernel via a pointer, the
application must ensure that the data has the same size and representation on
the host and inside the OpenCL C kernel.
Applications can use the OpenCL types (e.g. `cl_int`) for this purpose.

=== Iteration space and work-item functions

A `kernel` object created from OpenCL C source code must be launched either as
a single-task kernel or as an nd-range kernel.
Attempting to launch such a kernel with a simple range iteration space results
in undefined behavior.

If the kernel is launched as a single-task kernel, it is executed with a
1-dimensional nd-range, with one work-group of one work-item.
Because it is launched as an nd-range kernel, the kernel can use features that
are normally prohibited in single-task kernels.
For example, the `local_accessor` type is allowed as a kernel argument, and the
kernel can use OpenCL C work-group collective functions and sub-group
functions.
Of course, these features have limited use because the kernel is launched with
just a single work-item.

If the kernel is launched as an nd-range kernel, the number of work-group
dimensions is the same as the number of dimensions in the `nd_range`.
The global size, local size, and the number of work-groups is determined in the
usual way from the `nd_range`.
If the OpenCL C kernel is decorated with the `reqd_work_group_size` attribute,
the local size in the `nd_range` must match this value.

The kernel may call the functions defined in section 6.15.1 "Work-Item
Functions" of the OpenCL C specification, with the following clarification.
Some of these functions take a `dimindx` parameter that selects a dimension
index.
This index has the opposite sense from SYCL, as described in section C.7.7
"OpenCL kernel conventions and SYCL" of the core SYCL specification.
To illustrate, consider a call to `get_global_size` from a kernel that is
invoked with a 3-dimensional `nd_range`.
Calling `get_global_size(0)` retrieves the global size from dimension 2 of the
`nd_range`, and calling `get_global_size(2)` retrieves the global size from
dimension 0 of the `nd_range`.


== Examples

Expand All @@ -340,9 +387,10 @@ _{endnote}_]
The following example shows a simple SYCL program that defines an OpenCL C
kernel as a string and then compiles and launches it.

```
[source,c++]
----
#include <sycl/sycl.hpp>
#include <OpenCL/opencl.h>
#include <CL/opencl.h>
namespace syclex = sycl::ext::oneapi::experimental;
int main() {
Expand Down Expand Up @@ -372,6 +420,7 @@ int main() {
sycl::kernel k = kb_exe.ext_oneapi_get_kernel("my_kernel");
constexpr int N = 4;
constexpr int WGSIZE = 1;
cl_int input[N] = {0, 1, 2, 3};
cl_int output[N] = {};
Expand All @@ -385,19 +434,21 @@ int main() {
// Each argument to the kernel is a SYCL accessor.
cgh.set_args(in, out);
// Invoke the kernel over a range.
cgh.parallel_for(sycl::range{N}, k);
// Invoke the kernel over an nd-range.
sycl::nd_range ndr{{N}, {WGSIZE}};
cgh.parallel_for(ndr, k);
});
}
```
----

=== Querying supported features and extensions

This example demonstrates how to query the version of OpenCL C that is
supported, how to query the supported features, and how to query the
supported extensions.

```
[source,c++]
----
#include <iostream>
#include <sycl/sycl.hpp>
namespace syclex = sycl::ext::oneapi::experimental;
Expand Down Expand Up @@ -426,24 +477,4 @@ int main() {
std::cout << "Device supports online compilation with the OpenCL full profile\n";
}
```


== Issues

* Do we need to document some restrictions on the OpenCL C
https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#work-item-functions[
work-item functions] that the kernel can call, which depends on how the
kernel was launched?
For example, can a kernel launched with the simple `range` form of
`parallel_for` call `get_local_size`?
In OpenCL, there is only one way to launch kernels
(`clEnqueueNDRangeKernel`), so it is always legal to call any of the
work-item functions.
If an OpenCL kernel is launched with a NULL `local_work_size` (which is
roughly equivalent to SYCL's `range` form of `parallel_for`), the
`get_local_size` function returns the local work-group size that is chosen by
the implementation.
Level Zero, similarly, has only one way to launch kernels.
Therefore, maybe it is OK to let kernels in this extension call any of the
work-item functions, regardless of how they are launched?
----
Loading

0 comments on commit eca61f1

Please sign in to comment.