From b8df46b85500bcd3bc2453a1093cc03db25ca0d8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pekka=20J=C3=A4=C3=A4skel=C3=A4inen?= Date: Tue, 24 Sep 2024 15:20:43 +0300 Subject: [PATCH] BDA: Made the allocation flags independent from each other ...and renamed them to CL_MEM_DEVICE_SHARED_ADDRESS_EXT and CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT. The first one guarantees the same address across all devices in the context, whereas the latter allows per-device addresses. --- .../cl_ext_buffer_device_address.asciidoc | 55 ++++++++++--------- 1 file changed, 30 insertions(+), 25 deletions(-) diff --git a/extensions/cl_ext_buffer_device_address.asciidoc b/extensions/cl_ext_buffer_device_address.asciidoc index dabf5d1c..fdf4eb44 100644 --- a/extensions/cl_ext_buffer_device_address.asciidoc +++ b/extensions/cl_ext_buffer_device_address.asciidoc @@ -46,7 +46,7 @@ Draft. == Version Built On: {docdate} + -Revision: 0.2.0 +Revision: 0.3.0 == Dependencies @@ -59,8 +59,6 @@ This extension requires OpenCL 1.0 or later. The basic cl_mem buffer API doesn't enable access to the underlying raw pointers in the device memory, preventing its use in host side data structures that need pointer references to objects. -This API adds a minimal increment on top of cl_mem that provides such -capabilities. Shared Virtual Memory (SVM) introduced in OpenCL 2.0 is the first feature that enables raw device side pointers in the OpenCL standard. Its coarse-grain @@ -69,17 +67,18 @@ coherency requirements, but it requires mapping the buffer's address range to the host virtual address space although it might not be needed by the application. This is not an issue in systems which can provide virtual memory across the platform, but might provide implementation challenges in cases -where the device presents a global memory with its disjoint address space +where the device presents a global memory with a disjoint address space (that can also be a physical memory address space) or, for example, when a barebone embedded system lacks virtual memory support altogether. Various higher-level APIs present a memory allocation routine which can allocate device-only memory and provide raw pointers to it without guarentees -of system-wide uniqueness: Minimal implementations of OpenMP's omp_target_alloc() and -CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared +of system-wide uniqueness: For example, minimal implementations of OpenMP's +omp_target_alloc() and CUDA/HIP's cudaMalloc()/hipMalloc() do not require a shared address space between the host and the device. This extension is meant to -provide a minimal set of features to implement such APIs without requiring -a shared virtual address space between the host and the device. +provide a minimal set of features to implement such APIs using the cl_mem +buffers without requiring a shared virtual address space between the host and +the device. === New API Function @@ -92,8 +91,8 @@ Enums for enabling device pointer properties when creating a buffer [source] ---- -#define CL_MEM_DEVICE_ADDRESS_EXT (1ul << 31) -#define CL_MEM_DEVICE_PRIVATE_EXT (1ul << 30) +#define CL_MEM_DEVICE_SHARED_ADDRESS_EXT (1ul << 31) +#define CL_MEM_DEVICE_PRIVATE_ADDRESS_EXT (1ul << 30) ---- Enums for querying the device pointer from the cl_mem <>: @@ -142,9 +141,9 @@ Add new allocation flags <