-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cl_ext_buffer_device_address #1159
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,320 @@ | ||
= cl_ext_buffer_device_address | ||
|
||
// This section needs to be after the document title. | ||
:doctype: book | ||
:toc2: | ||
:toc: left | ||
:encoding: utf-8 | ||
:lang: en | ||
|
||
:blank: pass:[ +] | ||
|
||
// Set the default source code type in this document to C, | ||
// for syntax highlighting purposes. | ||
:language: c | ||
|
||
// This is what is needed for C++, since docbook uses c++ | ||
// and everything else uses cpp. This doesn't work when | ||
// source blocks are in table cells, though, so don't use | ||
// C++ unless it is required. | ||
//:language: {basebackend@docbook:c++:cpp} | ||
|
||
== Name Strings | ||
|
||
`cl_ext_buffer_device_address` | ||
|
||
== Contact | ||
|
||
Pekka Jääskeläinen, Intel (pekka 'dot' jaaskelainen 'at' intel 'dot' com) | ||
|
||
== Contributors | ||
|
||
// spell-checker: disable | ||
Pekka Jääskeläinen, Intel + | ||
Karol Herbst, Red Hat + | ||
Henry Linjamäki, Intel + | ||
// spell-checker: enable | ||
|
||
== Notice | ||
|
||
Copyright (c) 2024 Intel Corporation. All rights reserved. | ||
|
||
== Status | ||
|
||
Draft. | ||
|
||
== Version | ||
|
||
Built On: {docdate} + | ||
Revision: 0.3.0 | ||
|
||
== Dependencies | ||
|
||
This extension is written against the OpenCL Specification version 3.0.16. | ||
|
||
This extension requires OpenCL 1.0 or later. | ||
|
||
== Overview | ||
|
||
The basic cl_mem buffer API doesn't enable access to the underlying raw | ||
pointers in the device memory, preventing its use in host side | ||
data structures that need pointer references to objects. | ||
|
||
Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature | ||
that enables raw device side pointers in the OpenCL standard. Its coarse-grain | ||
variant is relatively simple to implement on various platforms in terms of | ||
coherency requirements, but it requires mapping the buffer's address range | ||
to the host virtual address space although it might not be needed by the | ||
application. This is not an issue in systems which can provide virtual memory | ||
across the platform, but might provide implementation challenges in cases | ||
where the device presents a global memory with a disjoint address space | ||
(that can also be a physical memory address space) or, for example, when | ||
a barebone embedded system lacks virtual memory support altogether. | ||
|
||
Various higher-level APIs present a memory allocation routine which can | ||
allocate device-only memory and provide raw pointers to it without guarentees | ||
of system-wide uniqueness: For example, minimal implementations of OpenMP's | ||
omp_target_alloc() and CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared | ||
address space between the host and the device. This extension is meant to | ||
provide a minimal set of features to implement such APIs using the cl_mem | ||
buffers without requiring a shared virtual address space between the host and | ||
the device. | ||
|
||
=== New API Function | ||
|
||
include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[] | ||
|
||
=== New API Enums | ||
|
||
Enums for enabling device pointer properties when creating a buffer | ||
{clCreateBuffer}, see <<clCreateBuffer, the list of supported memory flag values table>>: | ||
|
||
[source] | ||
---- | ||
#define CL_MEM_DEVICE_SHARED_ADDRESS_EXT (1ul << 31) | ||
#define CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT (1ul << 30) | ||
---- | ||
|
||
Enums for querying the device pointer from the cl_mem <<clGetMemObjectInfo, the list of supported param_names table>>: | ||
|
||
[source] | ||
---- | ||
#define CL_MEM_DEVICE_PTR_EXT 0xff01 | ||
---- | ||
|
||
Enums for setting information of indirect device pointer accesses to kernels <<clSetKernelExecInfo, the list of supported param_names table>>. This is for OpenCL 2.0 and above. When implementing the | ||
extension on an older OpenCL version, indirect device pointer access is not supported. | ||
|
||
[source] | ||
---- | ||
#define CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT 0x11B8 | ||
---- | ||
|
||
== New API Types | ||
|
||
Returned as the query result value *clGetMemObjectInfo* with `CL_DEVICE_PTR_EXT`. | ||
|
||
[source] | ||
---- | ||
typedef cl_ulong cl_mem_device_address_EXT; | ||
---- | ||
|
||
Returned as the query result value *clGetMemObjectInfo* with `CL_DEVICE_PTRS_EXT`. | ||
|
||
[source] | ||
---- | ||
typedef struct _cl_mem_device_address_pair_EXT | ||
{ | ||
cl_device_id device; | ||
cl_mem_device_address_EXT address; | ||
} cl_mem_device_address_pair_EXT; | ||
---- | ||
|
||
== Modifications to the OpenCL API Specification | ||
|
||
=== Section 5.2.1 - Creating Buffer Objects: | ||
|
||
Add new allocation flags <<clCreateBuffer, List of supported memory flag values table>>: | ||
|
||
[[list-of-supported-memory-flag-values-adds]] | ||
.List of supported memory flags by {clCreateBuffer} | ||
[width="100%",cols="<50%,<50%",options="header"] | ||
|==== | ||
| Memory Flags | Description | ||
| {CL_MEM_DEVICE_SHARED_ADDRESS_EXT_anchor} | ||
|
||
include::{generated}/api/version-notes/CL_MEM_DEVICE_SHARED_ADDRESS_EXT.asciidoc[] | ||
| This flag specifies that the buffer must have a single fixed address | ||
for its lifetime and the address should be unique at least across the devices | ||
of the context, but not necessarily within the host (virtual) memory. | ||
|
||
The flag might imply that the buffer will be "pinned" permanently to | ||
a device's memory, but might not be necessarily so, as long as the address | ||
range of the buffer remains constant. | ||
|
||
The address is guaranteed to remain the same until the buffer is freed, and | ||
the address can be queried via {clGetMemObjectInfo}. | ||
|
||
The device-specific buffer content updates are still performed by | ||
implicit or explicit buffer migrations performed by the runtime or the | ||
client code. If all of the devices in the context do not support | ||
this type of allocations, an error (CL_INVALID_VALUE) is returned. | ||
|
||
The device addresses of sub-buffers derived from CL_MEM_DEVICE_SHARED_ADDRESS_EXT | ||
allocated buffers can be computed by adding the sub-buffer origin to the | ||
start address. | ||
|
||
| {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor} | ||
|
||
include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT.asciidoc[] | ||
| This flag specifies that the buffer must have a single fixed address | ||
for its lifetime. Each device in the context can have their own (fixed) | ||
device-side address and a copy of the created buffer which are synchronized | ||
implicitly by the runtime. The main difference to a default cl_mem allocation | ||
in that case is that the addresses are queriable with CL_MEM_DEVICE_PTRS_EXT | ||
and the per-device address is guaranteed to be the same for the entire lifetime | ||
of the cl_mem. | ||
|
||
The device addresses of sub-buffers derived from CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT | ||
allocated buffers can be computed by adding the sub-buffer origin to the | ||
device-specific start address. | ||
|
||
|==== | ||
|
||
// refError | ||
|
||
=== Section 5.5.6 - Memory Object Queries | ||
|
||
Add a new information type <<clGetMemObjectInfo, List of supported param_names table>>: | ||
|
||
[width="100%",cols="<33%,<17%,<50%",options="header"] | ||
|==== | ||
| Memory Object Info | Return type | Description | ||
| {CL_MEM_DEVICE_PTR_EXT_anchor} | ||
|
||
include::{generated}/api/version-notes/CL_MEM_DEVICE_PTR_EXT.asciidoc[] | ||
| {cl_mem_device_address_EXT_TYPE} | ||
| Returns the device address for a buffer allocated with | ||
CL_MEM_DEVICE_SHARED_ADDRESS_EXT. If the buffer was not created with the flag | ||
or there are multiple devices in the context and the buffer address is | ||
not the same for all of them, it returns CL_INVALID_MEM_OBJECT. | ||
|
||
| {CL_MEM_DEVICE_PTRS_EXT_anchor} | ||
include::{generated}/api/version-notes/CL_MEM_DEVICE_PTRS_EXT.asciidoc[] | ||
| {cl_mem_device_address_pair_EXT_TYPE} | ||
| Returns the device-address pairs for all devices in the context. | ||
The per-device addresses might differ when the buffer was allocated | ||
with the CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT enabled. | ||
|==== | ||
|
||
|
||
=== Section 5.9.2 - Setting Kernel Arguments | ||
|
||
Add a new kernel argument setter for device pointers <<setting-kernel-arguments, Section 5.9.2>>: | ||
|
||
To set a device pointer as the argument value for a specific argument of a | ||
kernel, call the function | ||
|
||
include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[] | ||
include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[] | ||
|
||
* _kernel_ is a valid kernel object. | ||
* _arg_index_ is the argument index. | ||
Arguments to the kernel are referred by indices that go from 0 for the | ||
leftmost argument to _n_ - 1, where _n_ is the total number of arguments | ||
declared by a kernel. | ||
* _arg_value_ is the device pointer that should be used as the argument value for | ||
argument specified by _arg_index_. | ||
The device pointer specified is the value used by all API calls that enqueue | ||
_kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument | ||
value is changed by a call to {clSetKernelArgDevicePointer} for _kernel_. | ||
The device pointer can only be used for arguments that are declared to be a | ||
pointer to `global` memory allocated with clCreateBuffer() with the | ||
CL_MEM_DEVICE_SHARED_ADDRESS_EXT flag. The pointer value specified as the argument value | ||
can be the pointer to the beginning of the buffer or be a pointer offset into | ||
the buffer region. The device pointer value must be naturally aligned according to | ||
the argument's type. | ||
|
||
// refError | ||
|
||
{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the function was executed | ||
successfully. Otherwise, it returns one of the following errors: | ||
|
||
* {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object. | ||
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support | ||
the device pointer. | ||
* {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index. | ||
* {CL_INVALID_ARG_VALUE} if _arg_value_ specified is not a valid value. | ||
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required | ||
by the OpenCL implementation on the device. | ||
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources | ||
required by the OpenCL implementation on the host. | ||
|
||
Add a new flag to clSetKernelExecInfo for setting indirect device pointer access info <<clSetKernelExecInfo, List of supported param_name stable>>: | ||
|
||
[width="100%",cols="<33%,<17%,<50%",options="header"] | ||
|==== | ||
| Kernel Exec Info | Type | Description | ||
| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor} | ||
|
||
include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[] | ||
| {cl_mem_device_address_EXT_TYPE} | ||
| Device pointers must reference locations contained entirely within | ||
buffers that are passed to kernel as arguments, or that are passed | ||
through the execution information. | ||
|
||
Non-argument device pointers accessed by the kernel must be specified | ||
by passing pointers to those buffers via {clSetKernelExecInfo}. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we could discuss whether we want to require applications to always pass in the full list of buffers, because that would allow drivers to not having to keep track of But one can also argue that it should be convenient for users to use this extension, so they don't have to keep track themselves, but that makes the hot path (launching many kernels) more expensive as at launch time the latest you need to map back from pointer to buffer. But I think it's also fine to keep it like this, because it's closer to how it's done for SVM. |
||
|==== | ||
|
||
// refError | ||
|
||
Change the descriptions for when returning CL_INVALID_OPERATION from {clSetKernelExecInfo} | ||
as follows: | ||
|
||
* {CL_INVALID_OPERATION} if passing {CL_KERNEL_EXEC_INFO_SVM_PTRS} or | ||
{CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM} with _param_value_ set to CL_TRUE | ||
and no device in the context associated with _kernel_ support SVM. | ||
* {CL_INVALID_OPERATION} if passing {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT} and no | ||
device in the context associated with _kernel_ support device pointers. | ||
|
||
== Interactions with Other Extensions | ||
|
||
This extension is targeted to complement the OpenCL SVM extension and/or the | ||
Intel Unified Shared Memory extension by providing an additional lower-end | ||
step in the spectrum of type of pointers/buffers OpenCL can allocate. The | ||
extension can be seen as a simplification of the USM Device allocation type | ||
which drops the need to map the device buffer's address range to the same | ||
position in the host memory or to implement platform-wide VM. | ||
|
||
== Issues | ||
|
||
None. | ||
|
||
== Version History | ||
|
||
[cols="5,15,15,70"] | ||
[grid="rows"] | ||
[options="header"] | ||
|==== | ||
| *Version* | *Date* | *Author* | *Changes* | ||
| 0.3.0 | 2024-09-24 | Pekka Jääskeläinen, Karol Herbst | | ||
Made the allocation flags independent from each other and | ||
renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and | ||
CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the | ||
same address across all devices in the context, whereas the latter | ||
allows per-device addresses. | ||
| 0.2.0 | 2024-09-09 | Pekka Jääskeläinen, Karol Herbst | | ||
Changed the CL_MEM_DEVICE_ADDRESS_EXT wording for multi-device | ||
cases "all", not "any", covering a case where not all devices | ||
can ensure the same address across the context. In that case | ||
CL_INVALID_VALUE can be returned. Defined sub-buffer address | ||
computation to be 'base_addr + origin'. Added error conditions | ||
for clSetKernelExecInfo when the device doesn't support | ||
device pointers. | ||
| 0.1.0 | 2024-05-07 | Pekka Jääskeläinen | First draft text for feedback. | ||
This version describes the first API version that was prototyped | ||
in PoCL and RustiCL using temporary placeholder flag/enum values. | ||
The PoCL implementation and initial discussion on the extension | ||
can be found https://github.com/pocl/pocl/pull/1441[in this PR]. | ||
|==== |
This comment was marked as resolved.
Sorry, something went wrong.