Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cl_ext_buffer_device_address #1159

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
320 changes: 320 additions & 0 deletions extensions/cl_ext_buffer_device_address.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,320 @@
= cl_ext_buffer_device_address

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en

:blank: pass:[ +]

// Set the default source code type in this document to C,
// for syntax highlighting purposes.
:language: c

// This is what is needed for C++, since docbook uses c++
// and everything else uses cpp. This doesn't work when
// source blocks are in table cells, though, so don't use
// C++ unless it is required.
//:language: {basebackend@docbook:c++:cpp}

== Name Strings

`cl_ext_buffer_device_address`

== Contact

Pekka Jääskeläinen, Intel (pekka 'dot' jaaskelainen 'at' intel 'dot' com)

== Contributors

// spell-checker: disable
Pekka Jääskeläinen, Intel +
Karol Herbst, Red Hat +
Henry Linjamäki, Intel +
// spell-checker: enable

== Notice

Copyright (c) 2024 Intel Corporation. All rights reserved.

== Status

Draft.

== Version

Built On: {docdate} +
Revision: 0.3.0

== Dependencies

This extension is written against the OpenCL Specification version 3.0.16.

This extension requires OpenCL 1.0 or later.

== Overview

The basic cl_mem buffer API doesn't enable access to the underlying raw
pointers in the device memory, preventing its use in host side
data structures that need pointer references to objects.

Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature
that enables raw device side pointers in the OpenCL standard. Its coarse-grain
variant is relatively simple to implement on various platforms in terms of
coherency requirements, but it requires mapping the buffer's address range
to the host virtual address space although it might not be needed by the
application. This is not an issue in systems which can provide virtual memory
across the platform, but might provide implementation challenges in cases
where the device presents a global memory with a disjoint address space
(that can also be a physical memory address space) or, for example, when
a barebone embedded system lacks virtual memory support altogether.

Various higher-level APIs present a memory allocation routine which can
allocate device-only memory and provide raw pointers to it without guarentees
of system-wide uniqueness: For example, minimal implementations of OpenMP's
omp_target_alloc() and CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared
address space between the host and the device. This extension is meant to
provide a minimal set of features to implement such APIs using the cl_mem
buffers without requiring a shared virtual address space between the host and
the device.

=== New API Function

include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[]

=== New API Enums

Enums for enabling device pointer properties when creating a buffer
{clCreateBuffer}, see <<clCreateBuffer, the list of supported memory flag values table>>:

[source]
----
#define CL_MEM_DEVICE_SHARED_ADDRESS_EXT (1ul << 31)
#define CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT (1ul << 30)
----

Enums for querying the device pointer from the cl_mem <<clGetMemObjectInfo, the list of supported param_names table>>:

[source]
----
#define CL_MEM_DEVICE_PTR_EXT 0xff01
----

Enums for setting information of indirect device pointer accesses to kernels <<clSetKernelExecInfo, the list of supported param_names table>>. This is for OpenCL 2.0 and above. When implementing the
extension on an older OpenCL version, indirect device pointer access is not supported.

[source]
----
#define CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT 0x11B8
----

== New API Types

Returned as the query result value *clGetMemObjectInfo* with `CL_DEVICE_PTR_EXT`.

[source]
----
typedef cl_ulong cl_mem_device_address_EXT;
----

Returned as the query result value *clGetMemObjectInfo* with `CL_DEVICE_PTRS_EXT`.

[source]
----
typedef struct _cl_mem_device_address_pair_EXT
{
cl_device_id device;
cl_mem_device_address_EXT address;
} cl_mem_device_address_pair_EXT;
----

== Modifications to the OpenCL API Specification

=== Section 5.2.1 - Creating Buffer Objects:

Add new allocation flags <<clCreateBuffer, List of supported memory flag values table>>:

[[list-of-supported-memory-flag-values-adds]]
.List of supported memory flags by {clCreateBuffer}
[width="100%",cols="<50%,<50%",options="header"]
|====
| Memory Flags | Description
| {CL_MEM_DEVICE_SHARED_ADDRESS_EXT_anchor}

include::{generated}/api/version-notes/CL_MEM_DEVICE_SHARED_ADDRESS_EXT.asciidoc[]
| This flag specifies that the buffer must have a single fixed address
for its lifetime and the address should be unique at least across the devices
of the context, but not necessarily within the host (virtual) memory.

The flag might imply that the buffer will be "pinned" permanently to
a device's memory, but might not be necessarily so, as long as the address
range of the buffer remains constant.

The address is guaranteed to remain the same until the buffer is freed, and
the address can be queried via {clGetMemObjectInfo}.

The device-specific buffer content updates are still performed by
implicit or explicit buffer migrations performed by the runtime or the
client code. If all of the devices in the context do not support
this type of allocations, an error (CL_INVALID_VALUE) is returned.

The device addresses of sub-buffers derived from CL_MEM_DEVICE_SHARED_ADDRESS_EXT
allocated buffers can be computed by adding the sub-buffer origin to the
start address.

| {CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT_anchor}

include::{generated}/api/version-notes/CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT.asciidoc[]
| This flag specifies that the buffer must have a single fixed address
for its lifetime. Each device in the context can have their own (fixed)
device-side address and a copy of the created buffer which are synchronized
implicitly by the runtime. The main difference to a default cl_mem allocation
in that case is that the addresses are queriable with CL_MEM_DEVICE_PTRS_EXT
and the per-device address is guaranteed to be the same for the entire lifetime
of the cl_mem.

The device addresses of sub-buffers derived from CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT
allocated buffers can be computed by adding the sub-buffer origin to the
device-specific start address.

|====

// refError

=== Section 5.5.6 - Memory Object Queries

Add a new information type <<clGetMemObjectInfo, List of supported param_names table>>:

[width="100%",cols="<33%,<17%,<50%",options="header"]
|====
| Memory Object Info | Return type | Description
| {CL_MEM_DEVICE_PTR_EXT_anchor}

include::{generated}/api/version-notes/CL_MEM_DEVICE_PTR_EXT.asciidoc[]
| {cl_mem_device_address_EXT_TYPE}
| Returns the device address for a buffer allocated with
CL_MEM_DEVICE_SHARED_ADDRESS_EXT. If the buffer was not created with the flag
or there are multiple devices in the context and the buffer address is
not the same for all of them, it returns CL_INVALID_MEM_OBJECT.

| {CL_MEM_DEVICE_PTRS_EXT_anchor}
include::{generated}/api/version-notes/CL_MEM_DEVICE_PTRS_EXT.asciidoc[]
| {cl_mem_device_address_pair_EXT_TYPE}
| Returns the device-address pairs for all devices in the context.
The per-device addresses might differ when the buffer was allocated
with the CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT enabled.
|====


=== Section 5.9.2 - Setting Kernel Arguments

Add a new kernel argument setter for device pointers <<setting-kernel-arguments, Section 5.9.2>>:

To set a device pointer as the argument value for a specific argument of a
kernel, call the function

include::{generated}/api/protos/clSetKernelArgDevicePointerEXT.txt[]
include::{generated}/api/version-notes/clSetKernelArgDevicePointerEXT.asciidoc[]

* _kernel_ is a valid kernel object.
* _arg_index_ is the argument index.
Arguments to the kernel are referred by indices that go from 0 for the
leftmost argument to _n_ - 1, where _n_ is the total number of arguments
declared by a kernel.
* _arg_value_ is the device pointer that should be used as the argument value for
argument specified by _arg_index_.
The device pointer specified is the value used by all API calls that enqueue
_kernel_ ({clEnqueueNDRangeKernel} and {clEnqueueTask}) until the argument
value is changed by a call to {clSetKernelArgDevicePointer} for _kernel_.
The device pointer can only be used for arguments that are declared to be a
pointer to `global` memory allocated with clCreateBuffer() with the
CL_MEM_DEVICE_SHARED_ADDRESS_EXT flag. The pointer value specified as the argument value
can be the pointer to the beginning of the buffer or be a pointer offset into
the buffer region. The device pointer value must be naturally aligned according to
the argument's type.

// refError

{clSetKernelArgDevicePointerEXT} returns {CL_SUCCESS} if the function was executed
successfully. Otherwise, it returns one of the following errors:

* {CL_INVALID_KERNEL} if _kernel_ is not a valid kernel object.
* {CL_INVALID_OPERATION} if no devices in the context associated with _kernel_ support
the device pointer.
* {CL_INVALID_ARG_INDEX} if _arg_index_ is not a valid argument index.
* {CL_INVALID_ARG_VALUE} if _arg_value_ specified is not a valid value.
* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required
by the OpenCL implementation on the device.
* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources
required by the OpenCL implementation on the host.

Add a new flag to clSetKernelExecInfo for setting indirect device pointer access info <<clSetKernelExecInfo, List of supported param_name stable>>:

This comment was marked as resolved.


[width="100%",cols="<33%,<17%,<50%",options="header"]
|====
| Kernel Exec Info | Type | Description
| {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT_anchor}

include::{generated}/api/version-notes/CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT.asciidoc[]
| {cl_mem_device_address_EXT_TYPE}
| Device pointers must reference locations contained entirely within
buffers that are passed to kernel as arguments, or that are passed
through the execution information.

Non-argument device pointers accessed by the kernel must be specified
by passing pointers to those buffers via {clSetKernelExecInfo}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could discuss whether we want to require applications to always pass in the full list of buffers, because that would allow drivers to not having to keep track of bda allocations. But that also depends on if "this memory allocation is used in a dispatch" being relevant for the most/some OpenCL implementations in the first place.

But one can also argue that it should be convenient for users to use this extension, so they don't have to keep track themselves, but that makes the hot path (launching many kernels) more expensive as at launch time the latest you need to map back from pointer to buffer.

But I think it's also fine to keep it like this, because it's closer to how it's done for SVM.

|====

// refError

Change the descriptions for when returning CL_INVALID_OPERATION from {clSetKernelExecInfo}
as follows:

* {CL_INVALID_OPERATION} if passing {CL_KERNEL_EXEC_INFO_SVM_PTRS} or
{CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM} with _param_value_ set to CL_TRUE
and no device in the context associated with _kernel_ support SVM.
* {CL_INVALID_OPERATION} if passing {CL_KERNEL_EXEC_INFO_DEVICE_PTRS_EXT} and no
device in the context associated with _kernel_ support device pointers.

== Interactions with Other Extensions

This extension is targeted to complement the OpenCL SVM extension and/or the
Intel Unified Shared Memory extension by providing an additional lower-end
step in the spectrum of type of pointers/buffers OpenCL can allocate. The
extension can be seen as a simplification of the USM Device allocation type
which drops the need to map the device buffer's address range to the same
position in the host memory or to implement platform-wide VM.

== Issues

None.

== Version History

[cols="5,15,15,70"]
[grid="rows"]
[options="header"]
|====
| *Version* | *Date* | *Author* | *Changes*
| 0.3.0 | 2024-09-24 | Pekka Jääskeläinen, Karol Herbst |
Made the allocation flags independent from each other and
renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and
CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the
same address across all devices in the context, whereas the latter
allows per-device addresses.
| 0.2.0 | 2024-09-09 | Pekka Jääskeläinen, Karol Herbst |
Changed the CL_MEM_DEVICE_ADDRESS_EXT wording for multi-device
cases "all", not "any", covering a case where not all devices
can ensure the same address across the context. In that case
CL_INVALID_VALUE can be returned. Defined sub-buffer address
computation to be 'base_addr + origin'. Added error conditions
for clSetKernelExecInfo when the device doesn't support
device pointers.
| 0.1.0 | 2024-05-07 | Pekka Jääskeläinen | First draft text for feedback.
This version describes the first API version that was prototyped
in PoCL and RustiCL using temporary placeholder flag/enum values.
The PoCL implementation and initial discussion on the extension
can be found https://github.com/pocl/pocl/pull/1441[in this PR].
|====
2 changes: 2 additions & 0 deletions extensions/extensions.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ include::cl_ext_image_from_buffer.asciidoc[]
include::cl_ext_image_raw10_raw12.asciidoc[]
<<<
include::cl_ext_image_requirements_info.asciidoc[]
<<<
include::cl_ext_buffer_device_address.asciidoc[]

// Vendor Extensions
:leveloffset: 0
Expand Down
Loading
Loading