diff --git a/extensions/cl_ext_buffer_device_address.asciidoc b/extensions/cl_ext_buffer_device_address.asciidoc new file mode 100644 index 00000000..fdf4eb44 --- /dev/null +++ b/extensions/cl_ext_buffer_device_address.asciidoc @@ -0,0 +1,320 @@ += cl_ext_buffer_device_address + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en + +:blank: pass:[ +] + +// Set the default source code type in this document to C, +// for syntax highlighting purposes. +:language: c + +// This is what is needed for C++, since docbook uses c++ +// and everything else uses cpp. This doesn't work when +// source blocks are in table cells, though, so don't use +// C++ unless it is required. +//:language: {basebackend@docbook:c++:cpp} + +== Name Strings + +`cl_ext_buffer_device_address` + +== Contact + +Pekka Jääskeläinen, Intel (pekka 'dot' jaaskelainen 'at' intel 'dot' com) + +== Contributors + +// spell-checker: disable +Pekka Jääskeläinen, Intel + +Karol Herbst, Red Hat + +Henry Linjamäki, Intel + +// spell-checker: enable + +== Notice + +Copyright (c) 2024 Intel Corporation. All rights reserved. + +== Status + +Draft. + +== Version + +Built On: {docdate} + +Revision: 0.3.0 + +== Dependencies + +This extension is written against the OpenCL Specification version 3.0.16. + +This extension requires OpenCL 1.0 or later. + +== Overview + +The basic cl_mem buffer API doesn't enable access to the underlying raw +pointers in the device memory, preventing its use in host side +data structures that need pointer references to objects. + +Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature +that enables raw device side pointers in the OpenCL standard. Its coarse-grain +variant is relatively simple to implement on various platforms in terms of +coherency requirements, but it requires mapping the buffer's address range +to the host virtual address space although it might not be needed by the +application. This is not an issue in systems which can provide virtual memory +across the platform, but might provide implementation challenges in cases +where the device presents a global memory with a disjoint address space +(that can also be a physical memory address space) or, for example, when +a barebone embedded system lacks virtual memory support altogether. + +Various higher-level APIs present a memory allocation routine which can +allocate device-only memory and provide raw pointers to it without guarentees +of system-wide uniqueness: For example, minimal implementations of OpenMP's +omp_target_alloc() and CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared +address space between the host and the device. This extension is meant to +provide a minimal set of features to implement such APIs using the cl_mem +buffers without requiring a shared virtual address space between the host and +the device. + +=== New API Function + +include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[] + +=== New API Enums + +Enums for enabling device pointer properties when creating a buffer +{clCreateBuffer}, see <>: + +[source] +---- +#define CL_MEM_DEVICE_SHARED_ADDRESS_EXT (1ul << 31) +#define CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT (1ul << 30) +---- + +Enums for querying the device pointer from the cl_mem <>: + +[source] +---- +#define CL_MEM_DEVICE_PTR_EXT 0xff01 +---- + +Enums for setting information of indirect device pointer accesses to kernels <>. This is for OpenCL 2.0 and above. When implementing the +extension on an older OpenCL version, indirect device pointer access is not supported. + +[source] +---- +#define CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT 0x11B8 +---- + +== New API Types + +Returned as the query result value *clGetMemObjectInfo* with `CL_DEVICE_PTR_EXT`. + +[source] +---- +typedef cl_ulong cl_mem_device_address_EXT; +---- + +Returned as the query result value *clGetMemObjectInfo* with `CL_DEVICE_PTRS_EXT`. + +[source] +---- +typedef struct _cl_mem_device_address_pair_EXT +{ + cl_device_id device; + cl_mem_device_address_EXT address; +} cl_mem_device_address_pair_EXT; +---- + +== Modifications to the OpenCL API Specification + +=== Section 5.2.1 - Creating Buffer Objects: + +Add new allocation flags <>: + +[[list-of-supported-memory-flag-values-adds]] +.List of supported memory flags by {clCreateBuffer} +[width="100%",cols="<50%,<50%",options="header"] +|==== +| Memory Flags | Description +| {CL_MEM_DEVICE_SHARED_ADDRESS_EXT_anchor} + +include::{generated}/api/version-notes/CL_MEM_DEVICE_SHARED_ADDRESS_EXT.asciidoc[] + | This flag specifies that the buffer must have a single fixed address + for its lifetime and the address should be unique at least across the devices + of the context, but not necessarily within the host (virtual) memory. + + The flag might imply that the buffer will be "pinned" permanently to + a device's memory, but might not be necessarily so, as long as the address + range of the buffer remains constant. + + The address is guaranteed to remain the same until the buffer is freed, and + the address can be queried via {clGetMemObjectInfo}. + + The device-specific buffer content updates are still performed by + implicit or explicit buffer migrations performed by the runtime or the + client code. If all of the devices in the context do not support + this type of allocations, an error (CL_INVALID_VALUE) is returned. + + The device addresses of sub-buffers derived from CL_MEM_DEVICE_SHARED_ADDRESS_EXT + allocated buffers can be computed by adding the sub-buffer origin to the + start address. + +| {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor} + +include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT.asciidoc[] + | This flag specifies that the buffer must have a single fixed address + for its lifetime. Each device in the context can have their own (fixed) + device-side address and a copy of the created buffer which are synchronized + implicitly by the runtime. The main difference to a default cl_mem allocation + in that case is that the addresses are queriable with CL_MEM_DEVICE_PTRS_EXT + and the per-device address is guaranteed to be the same for the entire lifetime + of the cl_mem. + + The device addresses of sub-buffers derived from CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT + allocated buffers can be computed by adding the sub-buffer origin to the + device-specific start address. + +|==== + +// refError + +=== Section 5.5.6 - Memory Object Queries + +Add a new information type <>: + +[width="100%",cols="<33%,<17%,<50%",options="header"] +|==== +| Memory Object Info | Return type | Description +| {CL_MEM_DEVICE_PTR_EXT_anchor} + +include::{generated}/api/version-notes/CL_MEM_DEVICE_PTR_EXT.asciidoc[] + | {cl_mem_device_address_EXT_TYPE} + | Returns the device address for a buffer allocated with + CL_MEM_DEVICE_SHARED_ADDRESS_EXT. If the buffer was not created with the flag + or there are multiple devices in the context and the buffer address is + not the same for all of them, it returns CL_INVALID_MEM_OBJECT. + +| {CL_MEM_DEVICE_PTRS_EXT_anchor} +include::{generated}/api/version-notes/CL_MEM_DEVICE_PTRS_EXT.asciidoc[] + | {cl_mem_device_address_pair_EXT_TYPE} + | Returns the device-address pairs for all devices in the context. + The per-device addresses might differ when the buffer was allocated + with the CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT enabled. +|==== + + +=== Section 5.9.2 - Setting Kernel Arguments + +Add a new kernel argument setter for device pointers <>: + +To set a device pointer as the argument value for a specific argument of a +kernel, call the function + +include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[] +include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[] + + * _kernel_ is a valid kernel object. + * _arg_index_ is the argument index. + Arguments to the kernel are referred by indices that go from 0 for the + leftmost argument to _n_ - 1, where _n_ is the total number of arguments + declared by a kernel. + * _arg_value_ is the device pointer that should be used as the argument value for + argument specified by _arg_index_. + The device pointer specified is the value used by all API calls that enqueue + _kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument + value is changed by a call to {clSetKernelArgDevicePointer} for _kernel_. + The device pointer can only be used for arguments that are declared to be a + pointer to `global` memory allocated with clCreateBuffer() with the + CL_MEM_DEVICE_SHARED_ADDRESS_EXT flag. The pointer value specified as the argument value + can be the pointer to the beginning of the buffer or be a pointer offset into + the buffer region. The device pointer value must be naturally aligned according to + the argument's type. + +// refError + +{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the function was executed +successfully. Otherwise, it returns one of the following errors: + + * {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object. + * {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support + the device pointer. + * {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index. + * {CL_INVALID_ARG_VALUE} if _arg_value_ specified is not a valid value. + * {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required + by the OpenCL implementation on the device. + * {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources + required by the OpenCL implementation on the host. + +Add a new flag to clSetKernelExecInfo for setting indirect device pointer access info <>: + +[width="100%",cols="<33%,<17%,<50%",options="header"] +|==== +| Kernel Exec Info | Type | Description +| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor} + +include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[] + | {cl_mem_device_address_EXT_TYPE} + | Device pointers must reference locations contained entirely within + buffers that are passed to kernel as arguments, or that are passed + through the execution information. + + Non-argument device pointers accessed by the kernel must be specified + by passing pointers to those buffers via {clSetKernelExecInfo}. +|==== + +// refError + +Change the descriptions for when returning CL_INVALID_OPERATION from {clSetKernelExecInfo} +as follows: + + * {CL_INVALID_OPERATION} if passing {CL_KERNEL_EXEC_INFO_SVM_PTRS} or + {CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM} with _param_value_ set to CL_TRUE + and no device in the context associated with _kernel_ support SVM. + * {CL_INVALID_OPERATION} if passing {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT} and no + device in the context associated with _kernel_ support device pointers. + +== Interactions with Other Extensions + +This extension is targeted to complement the OpenCL SVM extension and/or the +Intel Unified Shared Memory extension by providing an additional lower-end +step in the spectrum of type of pointers/buffers OpenCL can allocate. The +extension can be seen as a simplification of the USM Device allocation type +which drops the need to map the device buffer's address range to the same +position in the host memory or to implement platform-wide VM. + +== Issues + +None. + +== Version History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|==== +| *Version* | *Date* | *Author* | *Changes* +| 0.3.0 | 2024-09-24 | Pekka Jääskeläinen, Karol Herbst | + Made the allocation flags independent from each other and + renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and + CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the + same address across all devices in the context, whereas the latter + allows per-device addresses. +| 0.2.0 | 2024-09-09 | Pekka Jääskeläinen, Karol Herbst | + Changed the CL_MEM_DEVICE_ADDRESS_EXT wording for multi-device + cases "all", not "any", covering a case where not all devices + can ensure the same address across the context. In that case + CL_INVALID_VALUE can be returned. Defined sub-buffer address + computation to be 'base_addr + origin'. Added error conditions + for clSetKernelExecInfo when the device doesn't support + device pointers. +| 0.1.0 | 2024-05-07 | Pekka Jääskeläinen | First draft text for feedback. + This version describes the first API version that was prototyped + in PoCL and RustiCL using temporary placeholder flag/enum values. + The PoCL implementation and initial discussion on the extension + can be found https://github.com/pocl/pocl/pull/1441[in this PR]. +|==== diff --git a/extensions/extensions.txt b/extensions/extensions.txt index d28468b9..52b05450 100644 --- a/extensions/extensions.txt +++ b/extensions/extensions.txt @@ -41,6 +41,8 @@ include::cl_ext_image_from_buffer.asciidoc[] include::cl_ext_image_raw10_raw12.asciidoc[] <<< include::cl_ext_image_requirements_info.asciidoc[] +<<< +include::cl_ext_buffer_device_address.asciidoc[] // Vendor Extensions :leveloffset: 0 diff --git a/xml/cl.xml b/xml/cl.xml index e71aebc9..73e4d80d 100644 --- a/xml/cl.xml +++ b/xml/cl.xml @@ -255,6 +255,7 @@ server's OpenCL/api-docs repository. typedef cl_bitfield cl_platform_command_buffer_capabilities_khr; typedef cl_bitfield cl_mutable_dispatch_asserts_khr typedef cl_bitfield cl_device_kernel_clock_capabilities_khr; + typedef cl_ulong cl_mem_device_address_EXT; Structure types @@ -304,6 +305,10 @@ server's OpenCL/api-docs repository. size_t origin size_t size + + cl_device_id device + cl_mem_device_address_EXT address + cl_version version char name[CL_NAME_VERSION_MAX_NAME_SIZE] @@ -911,7 +916,9 @@ server's OpenCL/api-docs repository. - + + + @@ -1630,7 +1637,9 @@ server's OpenCL/api-docs repository. - + + + @@ -1723,7 +1732,8 @@ server's OpenCL/api-docs repository. - + + @@ -3725,6 +3735,12 @@ server's OpenCL/api-docs repository. cl_uint arg_index const void* arg_value + + cl_int clSetKernelArgDevicePointerEXT + cl_kernel kernel + cl_uint arg_index + const void* arg_value + cl_int clSetKernelExecInfo cl_kernel kernel @@ -7186,6 +7202,26 @@ server's OpenCL/api-docs repository. + + + + + + + + + + + + + + + + + + + +