From 6c3323d7b57f82d216bb6f6a682138641a7b16d9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pekka=20J=C3=A4=C3=A4skel=C3=A4inen?= Date: Tue, 7 May 2024 19:05:29 +0300 Subject: [PATCH 1/4] cl_ext_buffer_device_address The basic cl_mem buffer API doesn't enable access to the underlying raw pointers in the device memory, preventing its use in host side data structures that need pointer references to objects. This API adds a minimal increment on top of cl_mem that provides such capabilities. The version 0.1.0 is implemented in PoCL and rusticl for prototyping, but everything's still up for discussion. chipStar is the first client that uses the API. --- .../cl_ext_buffer_device_address.asciidoc | 289 ++++++++++++++++++ extensions/extensions.txt | 2 + xml/cl.xml | 36 ++- 3 files changed, 323 insertions(+), 4 deletions(-) create mode 100644 extensions/cl_ext_buffer_device_address.asciidoc diff --git a/extensions/cl_ext_buffer_device_address.asciidoc b/extensions/cl_ext_buffer_device_address.asciidoc new file mode 100644 index 00000000..72fe5529 --- /dev/null +++ b/extensions/cl_ext_buffer_device_address.asciidoc @@ -0,0 +1,289 @@ += cl_ext_buffer_device_address + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en + +:blank: pass:[ +] + +// Set the default source code type in this document to C, +// for syntax highlighting purposes. +:language: c + +// This is what is needed for C++, since docbook uses c++ +// and everything else uses cpp. This doesn't work when +// source blocks are in table cells, though, so don't use +// C++ unless it is required. +//:language: {basebackend@docbook:c++:cpp} + +== Name Strings + +`cl_ext_buffer_device_address` + +== Contact + +Pekka Jääskeläinen, Intel (pekka 'dot' jaaskelainen 'at' intel 'dot' com) + +== Contributors + +// spell-checker: disable +Pekka Jääskeläinen, Intel + +Karol Herbst, Red Hat + +Henry Linjamäki, Intel + +// spell-checker: enable + +== Notice + +Copyright (c) 2024 Intel Corporation. All rights reserved. + +== Status + +Draft. + +== Version + +Built On: {docdate} + +Revision: 0.1.0 + +== Dependencies + +This extension is written against the OpenCL Specification version 3.0.16. + +This extension requires OpenCL 1.0 or later. + +== Overview + +The basic cl_mem buffer API doesn't enable access to the underlying raw +pointers in the device memory, preventing its use in host side +data structures that need pointer references to objects. +This API adds a minimal increment on top of cl_mem that provides such +capabilities. + +Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature +that enables raw device side pointers in the OpenCL standard. Its coarse-grain +variant is relatively simple to implement on various platforms in terms of +coherency requirements, but it requires mapping the buffer's address range +to the host virtual address space although it might not be needed by the +application. This is not an issue in systems which can provide virtual memory +across the platform, but might provide implementation challenges in cases +where the device presents a global memory with its disjoint address space +(that can also be a physical memory address space) or, for example, when +a barebone embedded system lacks virtual memory support altogether. + +Various higher-level APIs present a memory allocation routine which can +allocate device-only memory and provide raw pointers to it without guarentees +of system-wide uniqueness: Minimal implementations of OpenMP's omp_target_alloc() and +CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared +address space between the host and the device. This extension is meant to +provide a minimal set of features to implement such APIs without requiring +a shared virtual address space between the host and the device. + +=== New API Function + +include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[] + +=== New API Enums + +Enums for enabling device pointer properties when creating a buffer +{clCreateBuffer}, see <>: + +[source] +---- +#define CL_MEM_DEVICE_ADDRESS_EXT (1ul << 31) +#define CL_MEM_DEVICE_PRIVATE_EXT (1ul << 30) +---- + +Enums for querying the device pointer from the cl_mem <>: + +[source] +---- +#define CL_MEM_DEVICE_PTR_EXT 0xff01 +---- + +Enums for setting information of indirect device pointer accesses to kernels <>. This is for OpenCL 2.0 and above. When implementing the +extension on an older OpenCL version, indirect device pointer access is not supported. + +[source] +---- +#define CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT 0x11B8 +---- + +== New API Types + +Returned as the query result value *clGetMemObjectInfo* with `CL_DEVICE_PTR_EXT`. + +[source] +---- +typedef cl_ulong cl_mem_device_address_EXT; +---- + +Returned as the query result value *clGetMemObjectInfo* with `CL_DEVICE_PTRS_EXT`. + +[source] +---- +typedef struct _cl_mem_device_address_pair_EXT +{ + cl_device_id device; + cl_mem_device_address_EXT address; +} cl_mem_device_address_pair_EXT; +---- + +== Modifications to the OpenCL API Specification + +=== Section 5.2.1 - Creating Buffer Objects: + +Add new allocation flags <>: + +[[list-of-supported-memory-flag-values-adds]] +.List of supported memory flags by {clCreateBuffer} +[width="100%",cols="<50%,<50%",options="header"] +|==== +| Memory Flags | Description +| {CL_MEM_DEVICE_ADDRESS_EXT_anchor} + +include::{generated}/api/version-notes/CL_MEM_DEVICE_ADDRESS_EXT.asciidoc[] + | This flag specifies that the buffer must have a single fixed address + for its lifetime and the address should be unique at least across the devices + of the context, but not necessarily withing the host (virtual) memory. + + The flag might imply that the buffer will be "pinned" permanently to + a device's memory, but might not be necessarily so, as long as the address + range of the buffer remains constant. + + The address is guaranteed to remain the same until the buffer is freed, and + the address can be queried via {clGetMemObjectInfo}. + + The device-specific buffer content updates are still performed by + implicit or explicit buffer migrations performed by the runtime or the + client code. If any of the devices in the context does not support + this type of allocations, an error (CL_INVALID_VALUE) is returned. +| {CL_MEM_DEVICE_PRIVATE_EXT_anchor} + +include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_EXT.asciidoc[] + | If this flag is combined with CL_MEM_DEVICE_ADDRESS_EXT, each device in + the context can have their own (fixed) device-side address and copy of + the created buffer which are synchronized implicitly by the runtime. + The main difference to a default cl_mem allocation in that case is then + that the addresses are queriable with CL_MEM_DEVICE_PTRS_EXT and the + per-device address is guaranteed to be the same for the entire lifetime + of the cl_mem. +|==== + +// refError + +=== Section 5.5.6 - Memory Object Queries + +Add a new information type <>: + +[width="100%",cols="<33%,<17%,<50%",options="header"] +|==== +| Memory Object Info | Return type | Description +| {CL_MEM_DEVICE_PTR_EXT_anchor} + +include::{generated}/api/version-notes/CL_MEM_DEVICE_PTR_EXT.asciidoc[] + | {cl_mem_device_address_EXT_TYPE} + | Returns the device address for a buffer allocated with + CL_MEM_DEVICE_ADDRESS_EXT. If the buffer was not created with the flag + or there are multiple devices in the context and the buffer address is + not the same for all of them, it returns CL_INVALID_MEM_OBJECT. + +| {CL_MEM_DEVICE_PTRS_EXT_anchor} +include::{generated}/api/version-notes/CL_MEM_DEVICE_PTRS_EXT.asciidoc[] + | {cl_mem_device_address_pair_EXT_TYPE} + | Returns the device-address pairs for all devices in the context. + The per-device addresses might differ when the buffer was allocated + with the CL_MEM_DEVICE_PRIVATE_EXT enabled. +|==== + + +=== Section 5.9.2 - Setting Kernel Arguments + +Add a new kernel argument setter for device pointers <>: + +To set a device pointer as the argument value for a specific argument of a +kernel, call the function + +include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[] +include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[] + + * _kernel_ is a valid kernel object. + * _arg_index_ is the argument index. + Arguments to the kernel are referred by indices that go from 0 for the + leftmost argument to _n_ - 1, where _n_ is the total number of arguments + declared by a kernel. + * _arg_value_ is the device pointer that should be used as the argument value for + argument specified by _arg_index_. + The device pointer specified is the value used by all API calls that enqueue + _kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument + value is changed by a call to {clSetKernelArgSVMPointer} for _kernel_. + The device pointer can only be used for arguments that are declared to be a + pointer to `global` memory allocated with clCreateBuffer() with the + CL_MEM_DEVICE_ADDRESS_EXT flag. The pointer value specified as the argument value + can be the pointer to the beginning of the buffer or be a pointer offset into + the buffer region. The device pointer value must be naturally aligned according to + the argument's type. + +// refError + +{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the function was executed +successfully. Otherwise, it returns one of the following errors: + + * {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object. + * {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support + the device pointer. + * {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index. + * {CL_INVALID_ARG_VALUE} if _arg_value_ specified is not a valid value. + * {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required + by the OpenCL implementation on the device. + * {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources + required by the OpenCL implementation on the host. + +Add a new flag to clSetKernelExecInfo for setting indirect device pointer access info <>: + +[width="100%",cols="<33%,<17%,<50%",options="header"] +|==== +| Kernel Exec Info | Type | Description +| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor} + +include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[] + | {cl_mem_device_address_EXT_TYPE} + | Device pointers must reference locations contained entirely within + buffers that are passed to kernel as arguments, or that are passed + through the execution information. + + Non-argument device pointers accessed by the kernel must be specified + by passing pointers to those buffers via {clSetKernelExecInfo}. +|==== + +// refError + + +== Interactions with Other Extensions + +This extension is targeted to complement the OpenCL SVM extension and/or the +Intel Unified Shared Memory extension by providing an additional lower-end +step in the spectrum of type of pointers/buffers OpenCL can allocate. The +extension can be seen as a simplification of the USM Device allocation type +which drops the need to map the device buffer's address range to the same +position in the host memory or to implement platform-wide VM. + +== Issues + +None. + +== Version History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|==== +| *Version* | *Date* | *Author* | *Changes* +| 0.1.0 | 2024-05-07 | Pekka Jääskeläinen | First draft text for feedback. + This version describes the first API version that was prototyped + in PoCL and RustiCL using temporary placeholder flag/enum values. + The PoCL implementation and initial discussion on the extension + can be found https://github.com/pocl/pocl/pull/1441[in this PR]. +|==== diff --git a/extensions/extensions.txt b/extensions/extensions.txt index d28468b9..52b05450 100644 --- a/extensions/extensions.txt +++ b/extensions/extensions.txt @@ -41,6 +41,8 @@ include::cl_ext_image_from_buffer.asciidoc[] include::cl_ext_image_raw10_raw12.asciidoc[] <<< include::cl_ext_image_requirements_info.asciidoc[] +<<< +include::cl_ext_buffer_device_address.asciidoc[] // Vendor Extensions :leveloffset: 0 diff --git a/xml/cl.xml b/xml/cl.xml index e71aebc9..500dd803 100644 --- a/xml/cl.xml +++ b/xml/cl.xml @@ -255,6 +255,8 @@ server's OpenCL/api-docs repository. typedef cl_bitfield cl_platform_command_buffer_capabilities_khr; typedef cl_bitfield cl_mutable_dispatch_asserts_khr typedef cl_bitfield cl_device_kernel_clock_capabilities_khr; + typedef cl_ulong cl_mem_device_address_EXT; + typedef struct _cl_mem_device_address_pair_EXT* cl_mem_device_address_pair_EXT; Structure types @@ -304,6 +306,10 @@ server's OpenCL/api-docs repository. size_t origin size_t size + + cl_device_id device + cl_mem_device_address_EXT address + cl_version version char name[CL_NAME_VERSION_MAX_NAME_SIZE] @@ -911,7 +917,9 @@ server's OpenCL/api-docs repository. - + + + @@ -1630,7 +1638,9 @@ server's OpenCL/api-docs repository. - + + + @@ -1723,7 +1733,8 @@ server's OpenCL/api-docs repository. - + + @@ -3725,6 +3736,12 @@ server's OpenCL/api-docs repository. cl_uint arg_index const void* arg_value + + cl_int clSetKernelArgDevicePointerEXT + cl_kernel kernel + cl_uint arg_index + const void* arg_value + cl_int clSetKernelExecInfo cl_kernel kernel @@ -7186,7 +7203,18 @@ server's OpenCL/api-docs repository. - + + + + + + + + + + + + From 6ae7260ecfef81c46fb3d1feda8693ec23e052c0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pekka=20J=C3=A4=C3=A4skel=C3=A4inen?= Date: Mon, 26 Aug 2024 13:55:42 +0300 Subject: [PATCH 2/4] Apply suggestions from code review Co-authored-by: Sun Serega --- xml/cl.xml | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/xml/cl.xml b/xml/cl.xml index 500dd803..8b6cd161 100644 --- a/xml/cl.xml +++ b/xml/cl.xml @@ -256,7 +256,6 @@ server's OpenCL/api-docs repository. typedef cl_bitfield cl_mutable_dispatch_asserts_khr typedef cl_bitfield cl_device_kernel_clock_capabilities_khr; typedef cl_ulong cl_mem_device_address_EXT; - typedef struct _cl_mem_device_address_pair_EXT* cl_mem_device_address_pair_EXT; Structure types @@ -7203,14 +7202,22 @@ server's OpenCL/api-docs repository. - + + + + + + + + + From 6ec48c4400ed69e6789f9834596c3e6fd19d99d3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pekka=20J=C3=A4=C3=A4skel=C3=A4inen?= Date: Mon, 9 Sep 2024 16:54:49 +0300 Subject: [PATCH 3/4] cl_ext_buffer_device_address updates Changed the CL_MEM_DEVICE_ADDRESS_EXT wording for multi-device cases "all", not "any", covering a case where not all devices can ensure the same address across the context. In that case CL_INVALID_VALUE can be returned. Defined sub-buffer address computation to be 'base_addr + origin'. Added error conditions for clSetKernelExecInfo when the device doesn't support device pointers. --- .../cl_ext_buffer_device_address.asciidoc | 38 ++++++++++++++++--- 1 file changed, 32 insertions(+), 6 deletions(-) diff --git a/extensions/cl_ext_buffer_device_address.asciidoc b/extensions/cl_ext_buffer_device_address.asciidoc index 72fe5529..dabf5d1c 100644 --- a/extensions/cl_ext_buffer_device_address.asciidoc +++ b/extensions/cl_ext_buffer_device_address.asciidoc @@ -46,7 +46,7 @@ Draft. == Version Built On: {docdate} + -Revision: 0.1.0 +Revision: 0.2.0 == Dependencies @@ -147,7 +147,7 @@ Add new allocation flags < Date: Tue, 24 Sep 2024 15:20:43 +0300 Subject: [PATCH 4/4] BDA: Made the allocation flags independent from each other ...and renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the same address across all devices in the context, whereas the latter allows per-device addresses. --- .../cl_ext_buffer_device_address.asciidoc | 55 ++++++++++--------- xml/cl.xml | 4 +- 2 files changed, 32 insertions(+), 27 deletions(-) diff --git a/extensions/cl_ext_buffer_device_address.asciidoc b/extensions/cl_ext_buffer_device_address.asciidoc index dabf5d1c..fdf4eb44 100644 --- a/extensions/cl_ext_buffer_device_address.asciidoc +++ b/extensions/cl_ext_buffer_device_address.asciidoc @@ -46,7 +46,7 @@ Draft. == Version Built On: {docdate} + -Revision: 0.2.0 +Revision: 0.3.0 == Dependencies @@ -59,8 +59,6 @@ This extension requires OpenCL 1.0 or later. The basic cl_mem buffer API doesn't enable access to the underlying raw pointers in the device memory, preventing its use in host side data structures that need pointer references to objects. -This API adds a minimal increment on top of cl_mem that provides such -capabilities. Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature that enables raw device side pointers in the OpenCL standard. Its coarse-grain @@ -69,17 +67,18 @@ coherency requirements, but it requires mapping the buffer's address range to the host virtual address space although it might not be needed by the application. This is not an issue in systems which can provide virtual memory across the platform, but might provide implementation challenges in cases -where the device presents a global memory with its disjoint address space +where the device presents a global memory with a disjoint address space (that can also be a physical memory address space) or, for example, when a barebone embedded system lacks virtual memory support altogether. Various higher-level APIs present a memory allocation routine which can allocate device-only memory and provide raw pointers to it without guarentees -of system-wide uniqueness: Minimal implementations of OpenMP's omp_target_alloc() and -CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared +of system-wide uniqueness: For example, minimal implementations of OpenMP's +omp_target_alloc() and CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared address space between the host and the device. This extension is meant to -provide a minimal set of features to implement such APIs without requiring -a shared virtual address space between the host and the device. +provide a minimal set of features to implement such APIs using the cl_mem +buffers without requiring a shared virtual address space between the host and +the device. === New API Function @@ -92,8 +91,8 @@ Enums for enabling device pointer properties when creating a buffer [source] ---- -#define CL_MEM_DEVICE_ADDRESS_EXT (1ul << 31) -#define CL_MEM_DEVICE_PRIVATE_EXT (1ul << 30) +#define CL_MEM_DEVICE_SHARED_ADDRESS_EXT (1ul << 31) +#define CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT (1ul << 30) ---- Enums for querying the device pointer from the cl_mem <>: @@ -142,9 +141,9 @@ Add new allocation flags < - - + +