Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Doc] Clarify WI funcs in kernel compiler #12891

Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
== Notice

[%hardbreaks]
Copyright (C) 2023-2023 Intel Corporation. All rights reserved.
Copyright (C) 2023-2024 Intel Corporation. All rights reserved.

Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc.
Expand Down Expand Up @@ -54,11 +54,11 @@ This extension also depends on the following other SYCL extensions:

== Status

This is an experimental extension specification, intended to provide early
access to features and gather community feedback. Interfaces defined in
this specification are implemented in DPC++, but they are not finalized
and may change incompatibly in future versions of DPC++ without prior notice.
*Shipping software products should not rely on APIs defined in
This is an experimental extension specification, intended to provide early
access to features and gather community feedback. Interfaces defined in
this specification are implemented in {dpcpp}, but they are not finalized
and may change incompatibly in future versions of {dpcpp} without prior notice.
*Shipping software products should not rely on APIs defined in
this specification.*


Expand Down Expand Up @@ -101,7 +101,8 @@ This extension adds the `opencl` enumerator to the `source_language`
enumeration, which indicates that a kernel bundle defines kernels in the
OpenCL C language.

```
[source,c++]
----
namespace sycl::ext::oneapi::experimental {

enum class source_language : /*unspecified*/ {
Expand All @@ -110,7 +111,7 @@ enum class source_language : /*unspecified*/ {
};

} // namespace sycl::ext::oneapi::experimental
```
----

=== Source code is text format

Expand Down Expand Up @@ -278,60 +279,106 @@ functions identify a kernel using the function name, exactly as it appears in
the OpenCL C source code.
For example, if the kernel is defined this way in OpenCL C:

```
[source,c++]
----
__kernel
void foo(__global int *in, __global int *out) {/*...*/}
```
----

Then the application's host code can query for the kernel like so:

```
[source,c++]
----
sycl::kernel_bundle<sycl::bundle_state::executable> kb = /*...*/;
sycl::kernel k = kb.ext_oneapi_get_kernel("foo");
```
----

=== Kernel argument restrictions

When a kernel is defined in OpenCL C and invoked from SYCL via a `kernel`
object, the arguments to the kernel are restricted to certain types.
In general, the host application passes an argument value via
`handler::set_arg` using one type and the kernel receives the argument value
as a corresponding OpenCL C type.
The following table lists the set of valid types for these kernel arguments:

The following table defines the set of OpenCL C kernel argument types that are
supported by this extension and explains how to pass each type of argument from
SYCL.

[%header,cols="1,1"]
|===
|Type in SYCL host code
|Type in OpenCL C kernel
|OpenCL C type
|Corresponding SYCL type

|One of the OpenCL scalar types (e.g. `cl_int`, `cl_float`, etc.)
|The corresponding OpenCL C type (e.g. `int`, `float`, etc.)
|One of the OpenCL C scalar types (e.g. `int`, `float`, etc.)
|A {cpp} type that is device copyable, which has the same width and data
representation.

|A USM pointer.
|A `+__global+` pointer of the corresponding type.
[_Note:_ Applications typically use the corresponding OpenCL type (e.g.
`cl_int`, `cl_float`, etc.)
_{endnote}_]

|A class (or struct) that is device copyable in SYCL whose elements are
composed of OpenCL scalar types or USM pointers.
|A class (or struct) passed by value whose elements have the corresponding
OpenCL C types.
|A `+__global+` pointer.
|Either a {cpp} pointer (typically a pointer to USM memory) or an `accessor`
whose target is `target::device`.

|An `accessor` with `target::device` whose `DataT` is an OpenCL scalar type,
a USM pointer, or a device copyable class (or struct) whose elements are
composed of these types.
|A `+__global+` pointer to the first element of the accessor's buffer.
The pointer has the corresponding OpenCL C type.
|A `+__local+` pointer.
|A `local_accessor`.

[_Note:_ The accessor's size is not passed as a kernel argument, so the host
code must pass a separate argument with the size if this is desired.
[_Note:_ The `local_accessor` merely conveys the size of the local memory, such
that the kernel argument points to a local memory buffer of _N_ bytes, where
_N_ is the value returned by `local_accessor::byte_size`.
If the application wants to pass other information from the `local_accessor` to
the kernel (such as the value _N_), it must pass this as separate kernel
arguments.
_{endnote}_]

|A `local_accessor` whose `DataT` is an OpenCL scalar type, a USM pointer, or a
device copyable class (or struct) whose elements are composed of these types.
|A `+__local+` pointer to the first element of the accessor's local memory.
The pointer has the corresponding OpenCL C type.
|A class (or struct) passed by value.
|A {cpp} struct or class that is device copyable, which has the same size and
data representation as the OpenCL C struct.

[_Note:_ The SYCL argument must not contain any `accessor` or `local_accessor`
members because these types are not device copyable.
If the OpenCL C structure contains a pointer member, the corresponding SYCL
structure member is typically a USM pointer.
_{endnote}_]
|===

When data allocated on the host is accessed by the kernel via a pointer, the
application must ensure that the data has the same size and representation on
the host and inside the OpenCL C kernel.
Applications can use the OpenCL types (e.g. `cl_int`) for this purpose.

=== Iteration space and work-item functions

A `kernel` object created from OpenCL C source code must be launched either as
a single-task kernel or as an nd-range kernel.
Attempting to launch such a kernel with a simple range iteration space results
in undefined behavior.

If the kernel is launched as a single-task kernel, it is executed with a
1-dimensional nd-range, with one work-group of one work-item.
Because it is launched as an nd-range kernel, the kernel can use features that
are normally prohibited in single-task kernels.
For example, the `local_accessor` type is allowed as a kernel argument, and the
kernel can use OpenCL C work-group collective functions and sub-group
functions.
Of course, these features have limited use because the kernel is launched with
just a single work-item.

If the kernel is launched as an nd-range kernel, the number of work-group
dimensions is the same as the number of dimensions in the `nd_range`.
The global size, local size, and the number of work-groups is determined in the
usual way from the `nd_range`.
If the OpenCL C kernel is decorated with the `reqd_work_group_size` attribute,
the local size in the `nd_range` must match this value.

The kernel may call the functions defined in section 6.15.1 "Work-Item
Functions" of the OpenCL C specification, with the following clarification.
Some of these functions take a `dimindx` parameter that selects a dimension
index.
This index has the opposite sense from SYCL, as described in section C.7.7
"OpenCL kernel conventions and SYCL" of the core SYCL specification.
To illustrate, consider a call to `get_global_size` from a kernel that is
invoked with a 3-dimensional `nd_range`.
Calling `get_global_size(0)` retrieves the global size from dimension 2 of the
`nd_range`, and calling `get_global_size(2)` retrieves the global size from
dimension 0 of the `nd_range`.


== Examples

Expand All @@ -340,9 +387,10 @@ _{endnote}_]
The following example shows a simple SYCL program that defines an OpenCL C
kernel as a string and then compiles and launches it.

```
[source,c++]
----
#include <sycl/sycl.hpp>
#include <OpenCL/opencl.h>
#include <CL/opencl.h>
namespace syclex = sycl::ext::oneapi::experimental;

int main() {
Expand Down Expand Up @@ -372,6 +420,7 @@ int main() {
sycl::kernel k = kb_exe.ext_oneapi_get_kernel("my_kernel");

constexpr int N = 4;
constexpr int WGSIZE = 1;
cl_int input[N] = {0, 1, 2, 3};
cl_int output[N] = {};

Expand All @@ -385,19 +434,21 @@ int main() {
// Each argument to the kernel is a SYCL accessor.
cgh.set_args(in, out);

// Invoke the kernel over a range.
cgh.parallel_for(sycl::range{N}, k);
// Invoke the kernel over an nd-range.
sycl::nd_range ndr{{N}, {WGSIZE}};
cgh.parallel_for(ndr, k);
});
}
```
----

=== Querying supported features and extensions

This example demonstrates how to query the version of OpenCL C that is
supported, how to query the supported features, and how to query the
supported extensions.

```
[source,c++]
----
#include <iostream>
#include <sycl/sycl.hpp>
namespace syclex = sycl::ext::oneapi::experimental;
Expand Down Expand Up @@ -426,24 +477,4 @@ int main() {
std::cout << "Device supports online compilation with the OpenCL full profile\n";

}
```


== Issues

* Do we need to document some restrictions on the OpenCL C
https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#work-item-functions[
work-item functions] that the kernel can call, which depends on how the
kernel was launched?
For example, can a kernel launched with the simple `range` form of
`parallel_for` call `get_local_size`?
In OpenCL, there is only one way to launch kernels
(`clEnqueueNDRangeKernel`), so it is always legal to call any of the
work-item functions.
If an OpenCL kernel is launched with a NULL `local_work_size` (which is
roughly equivalent to SYCL's `range` form of `parallel_for`), the
`get_local_size` function returns the local work-group size that is chosen by
the implementation.
Level Zero, similarly, has only one way to launch kernels.
Therefore, maybe it is OK to let kernels in this extension call any of the
work-item functions, regardless of how they are launched?
----
Loading