diff --git a/extensions/cl_intel_unified_shared_memory.asciidoc b/extensions/cl_intel_unified_shared_memory.asciidoc index 934dbafa..ab6a6fb9 100644 --- a/extensions/cl_intel_unified_shared_memory.asciidoc +++ b/extensions/cl_intel_unified_shared_memory.asciidoc @@ -52,7 +52,7 @@ Shipping == Version Built On: {docdate} + -Revision: 1.0.0 +Revision: 1.1.0 == Dependencies @@ -743,9 +743,19 @@ Arguments to the kernel are referred to by indices that go from 0 for the leftmo _arg_value_ is the pointer value that should be used as the argument specified by _arg_index_. The pointer value will be used as the argument by all API calls that enqueue a kernel until the argument value is set to a different pointer value by a subsequent call. -A pointer into Unified Shared Memory allocation may only be set as an argument value for an argument declared to be a pointer to `global` or `constant` memory. +A pointer may only be set as an argument value for an argument declared to be a pointer to `global` or `constant` memory. + +[[valid-usm-pointer-argument-definition]] +The definition of a valid pointer value was changed in extension version 1.1.0: + +* For extension versions prior to version 1.1.0: For devices supporting shared system allocations, any pointer value is valid. Otherwise, the pointer value must be `NULL` or must point into a Unified Shared Memory allocation returned by *clHostMemAllocINTEL*, *clDeviceMemAllocINTEL*, or *clSharedMemAllocINTEL*. +* For extension versions 1.1.0 and newer: +For all devices, any pointer value is valid and may be set as an argument to a kernel. + +In this definition, a valid pointer value means that the function will not return an error. +It still may not be valid to dereference the pointer inside of a kernel if the memory that the pointer points to is not accessible on the device. *clSetKernelArgMemPointerINTEL* returns `CL_SUCCESS` if the function is executed successfully. Otherwise, it will return one of the following errors: @@ -795,6 +805,8 @@ The following errors may be returned by *clSetKernelExecInfo* for these new _par * `CL_INVALID_OPERATION` if _param_name_ is `CL_KERNEL_EXEC_INFO_INDIRECT_DEVICE_ACCESS_INTEL` and no devices in the context associated with _kernel_ support device Unified Shared Memory allocations. * `CL_INVALID_OPERATION` if _param_name_ is `CL_KERNEL_EXEC_INFO_INDIRECT_SHARED_ACCESS_INTEL` and no devices in the context associated with _kernel_ support shared Unified Shared Memory allocations. +The <> specified using `CL_KERNEL_EXEC_INFO_USM_PTRS_INTEL` was changed in extension version 1.1.0. + ==== Filling and Copying Unified Shared Memory The function @@ -1243,21 +1255,27 @@ Note that some flags will not be valid, such as `CL_MEM_USE_HOST_PTR`. . Should it be an error to set an unknown pointer as a kernel argument using *clSetKernelArgMemPointerINTEL* if no devices support shared system allocations? + -- -*UNRESOLVED*: -Returning an error for an unknown pointer is helpful to identify and diagnose possible programming errors sooner, but passing a pointer to arbitrary memory to a function on the host is not an error until the pointer is dereferenced. +`RESOLVED`: +The behavior of *clSetKernelArgMemPointerINTEL* was changed in version 1.1.0 of this extension. + +Prior to version 1.1.0, it was considered an error to set an arbitrary pointer value as an argument to a kernel if no devices support system USM. +This was helpful to identify possible programming errors, however it did not match the behavior of passing a pointer to a function on the host, where it is only a programming error if an invalid pointer is dereferenced. +To provide a similar programming experience, the error condition was relaxed in version 1.1.0, and any arbitrary pointer value may be passed to a kernel. -If we relax the error condition for *clSetKernelArgMemPointerINTEL* then we could also consider relaxing the error condition for *clSetKernelExecInfo*(`CL_KERNEL_EXEC_INFO_USM_PTRS_INTEL`) similarly. +The behavior was also changed for *clSetKernelExecInfo*(`CL_KERNEL_EXEC_INFO_USM_PTRS_INTEL`), similarly. -Note that if the error condition is removed we can still check for possible programming errors via optional USM checking layers, such as the https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md#usmchecking-bool[USMChecking] functionality in the https://github.com/intel/opencl-intercept-layer[OpenCL Intercept Layer]. +If desired, additional checks to identify possible programming errors may still be provided via optional USM checking layers, such as the https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md#usmchecking-bool[USMChecking] functionality in the https://github.com/intel/opencl-intercept-layer[OpenCL Intercept Layer]. -- -. Should we support a "rect" memcpy similar to *clEnqueueCopyBufferRect*? +. Should we support a 2D "rect" memcpy similar to *clEnqueueCopyBufferRect*? + -- *UNRESOLVED*: This would be a fairly straightforward addition if it is useful. -Note that there is no similar SVM "rect" memcpy. +Note that there is no similar 2D "rect" memcpy for SVM. + +We could also support a 2D "rect" fill or memset, though there are no similar functions for `cl_mem` buffers or SVM. -- . Should there be an upper limit on the size of an allocation using *clHostMemAllocINTEL*? @@ -1278,6 +1296,17 @@ For some devices, this query will return the same value as `CL_DEVICE_MAX_MEM_AL * Do nothing and keep the existing error behavior. -- +. Can a device USM allocation for a parent device be accessed by its sub-devices? +Can a single device shared USM allocation associated with a parent device be accessed by its sub-devices? ++ +-- +*UNRESOLVED*: +Since a sub-device is a partition of a parent device a USM allocation against a parent device should be accessible by its sub-devices. +We could document this expectation explicitly in this extension if it is not already covered by the main OpenCL specification. + +Note that a USM allocation against a sub-device need not be accessible by its parent device or by other sibling sub-devices, though some implementations may support this, just like some implementations optionally support access to USM allocations from other devices. +-- + == Revision History [cols="5,15,15,70"] @@ -1285,28 +1314,10 @@ For some devices, this query will return the same value as `CL_DEVICE_MAX_MEM_AL [options="header"] |======================================== |Rev|Date|Author|Changes -|A|2019-01-18|Ben Ashbaugh|*Initial revision* -|B|2019-03-25|Ben Ashbaugh|Minor name changes. -|C|2019-06-18|Ben Ashbaugh|Moved flags argument into properties. -|D|2019-07-19|Ben Ashbaugh|Editorial fixes. -|E|2019-07-22|Ben Ashbaugh|Allocation properties should be const. -|F|2019-07-26|Ben Ashbaugh|Removed DEFAULT mem alloc flag. -|G|2019-08-23|Ben Ashbaugh|Added mem alloc query for associated device. -|H|2019-10-11|Ben Ashbaugh|Added initial list and description of error codes. -|I|2019-11-14|Ben Ashbaugh|Switched from a memset to a memfill API. -|J|2019-11-18|Ben Ashbaugh|Updated a few more error conditions. -|K|2019-12-18|Krzysztof Gibala|Updated write combine description. -|L|2020-01-15|Ben Ashbaugh|Added invalid arg case to setkernelarg API. -|M|2020-01-17|Ben Ashbaugh|Minor name changes, removed const from memfree API. -|N|2020-01-22|Ben Ashbaugh|Updated write combine description. -|O|2020-01-23|Ben Ashbaugh|Added aliases for USM migration flags. -|P|2020-02-28|Ben Ashbaugh|Added blocking memfree API. -|Q|2020-03-12|Ben Ashbaugh|Name tweak for blocking memfree API, added comparison to SVM, allow zero memory advice. -|R|2020-08-21|Ben Ashbaugh|Fixed enum name typo in table. -|S|2020-08-26|Maciej Dziuban|Added initial placement flags for shared allocations. |1.0.0|2021-11-07|Ben Ashbaugh|Added version and other minor updates prior to posting on the OpenCL registry. |1.0.0|2022-11-08|Ben Ashbaugh|Added new issues regarding error behavior for clSetKernelArgMemPointerINTEL and rect copies. |1.0.1|2023-08-28|Ben Ashbaugh|Documented error conditions for clSetKernelExecInfo. +|1.1.0|2024-07-30|Ben Ashbaugh|Modified error behavior for clSetKernelArgMemPointerINTEL and clSetKernelExecInfo. |======================================== //************************************************************************