Allow changes to work dim in cl_khr_command_buffer_mutable_dispatch #1057

EwanC · 2024-02-09T13:24:38Z

Feature request to allow modifications to the work_dim struct member of cl_mutable_dispatch_config_khr such that the work dimension can change from that originally used in the kernel. This is forbidden in the current specification with error wording in clUpdateMutableCommandsKHR:

CL_INVALID_OPERATION if the work_dim is different from the work_dim set on command recording.

This is motivated by the fact that the equivalent CUDA-Graph, HIP-Graph, Level Zero APIs for updating kernel commands enable modifying the work dimension. Therefore in a SYCL API which lets the user change the dimensions in a kernel, this is a gap compared to other backends.

The text was updated successfully, but these errors were encountered:

bashbaug · 2024-02-12T16:34:03Z

What would the behavior be for an existing global work size, local work size, or global work offset if the user modifies the work_dim? For example, if a user modifies the work_dim would they also need to modify the global work size, and potentially the local work size and/or the global work offset?

FWIW, I may just be missing it, but it looks to me like the other APIs that allow updating kernel comments unconditionally use a three-dimensional dispatch. Are we sure they allow changing the dispatch dimensionality?

EwanC · 2024-02-14T08:27:45Z

FWIW, I may just be missing it, but it looks to me like the other APIs that allow updating kernel comments unconditionally use a three-dimensional dispatch. Are we sure they allow changing the dispatch dimensionality?

I've only tested with CUDA so far out of all those APIs, but going from a 3D dimensional dispatch where each dimension is has >1 work-items, and then updating to a dispatch with <X,Y,1> & <X,1,1> work-items does work. I've not tested the other way going from <X,1,1> original kernel to <X,Y,Z>, i.e. 1D to 3D update but I'll double check that.

What would the behavior be for an existing global work size, local work size, or global work offset if the user modifies the work_dim? For example, if a user modifies the work_dim would they also need to modify the global work size, and potentially the local work size and/or the global work offset?

I guess I'm primarily looking at this from a SYCL backend point of view, and in that SYCL update API I imagine the user would pass a new sycl::nd_range N dimensional index space. As part of that object we'd have the global work-size, local work-size, and global offset. So if we restricted changing the work dimension to only be possible when setting those configurations I think that would be a very reasonable constraint.

bashbaug · 2024-02-14T22:22:17Z

I've only tested with CUDA so far out of all those APIs, but going from a 3D dimensional dispatch where each dimension is has >1 work-items, and then updating to a dispatch with <X,Y,1> & <X,1,1> work-items does work. I've not tested the other way going from <X,1,1> original kernel to <X,Y,Z>, i.e. 1D to 3D update but I'll double check that.

This feels a little philosophical, but debatably going from a dispatch where each dimension <X,Y,Z> has more than one work-item to an <X,Y,1> or <X,1,1> dispatch is still a 3D dispatch, it's just that one of the dispatch dimensions happens to be one. You would really need to go to a <X,Y> or an <X> dispatch to go to a 2D or 1D dispatch.

From an OpenCL kernel perspective I don't think you would be able to observe the difference except for the get_work_dim() built-in function or the SPIR-V WorkDim built-in, but SYCL doesn't seem to have a similar built-in (or I'm missing it). There is a dimensionality encoded into the item / nd_item / etc. classes, but this seems like something different.

Since a SYCL kernel is written to accept an item / nd_item / etc. with a specific dimensionality, and the dimensionality of the range passed to parallel_for must match, does it really make sense to allow for modifying the dispatch dimensionality with the SYCL update API?

EwanC · 2024-02-15T15:03:02Z

This feels a little philosophical, but debatably going from a dispatch where each dimension <X,Y,Z> has more than one work-item to an <X,Y,1> or <X,1,1> dispatch is still a 3D dispatch, it's just that one of the dispatch dimensions happens to be one. You would really need to go to a <X,Y> or an dispatch to go to a 2D or 1D dispatch.

Yeah, a user could currently create a 3-dimensional kernel command and do the same technique of setting the unwanted dimensions to 1, and then update them later. Therefore an OpenCL backend to SYCL could also use the same trick.

From an OpenCL kernel perspective I don't think you would be able to observe the difference except for the get_work_dim() built-in function or the SPIR-V WorkDim built-in, but SYCL doesn't seem to have a similar built-in (or I'm missing it). There is a dimensionality encoded into the item / nd_item / etc. classes, but this seems like something different.

Since a SYCL kernel is written to accept an item / nd_item / etc. with a specific dimensionality, and the dimensionality of the range passed to parallel_for must match, does it really make sense to allow for modifying the dispatch dimensionality with the SYCL update API?

This is the SYCL kernel I was using to test this https://github.com/oneapi-src/unified-runtime/blob/main/test/conformance/device_code/indexers_usm.cpp and indeed it doesn't use a get_work_dim() , it just checks the global/local ids. I'll try to get information about the SYCL workloads to verify if updating dimension is really a required feature or not.

EwanC · 2024-02-22T09:40:08Z

I'm going to close this issue because I think a user could currently create a 3D kernel then update the unused dimensions to 1 if they need this functionality, but will reopen the issue if a concrete use case does come up that this technique doesn't fully satisfy.

See KhronosGroup/OpenCL-Docs#1057 for discussions as to why we shouldn't enable changing to number of dimensions in an update.

Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. However, the following changes to the UR kernel update API have been made based on implementation experience: 1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057 2. Remove struct fields to update exec info, after [DPC++ implementation prototype](intel/llvm#12840) shows this isn't needed. This adapter implementation depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU/CPUs OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu). ```bash $ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics" ``` DPC++ PR intel/llvm#12724

Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. However, the following changes to the UR kernel update API have been made based on implementation experience: 1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057 2. Remove struct fields to update exec info, after [DPC++ implementation prototype](intel/llvm#12840) shows this isn't needed. 3. Forbid changing the local work size from user to impl defined and vice-versa. See discussion in [L0 implementation PR](oneapi-src#1353 (comment)). This adapter implementation depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU/CPUs OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu). ```bash $ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics" ``` DPC++ PR intel/llvm#12724

EwanC added the cl_khr_command_buffer Relating to the command-buffer family of extension label Feb 9, 2024

EwanC mentioned this issue Feb 15, 2024

[SYCL][Graph] Specify API for explicit update using indices intel/llvm#12486

Merged

EwanC closed this as completed Feb 22, 2024

EwanC added a commit to Bensuo/unified-runtime that referenced this issue Feb 22, 2024

Error if number of dimensions changes during update

ede570c

See KhronosGroup/OpenCL-Docs#1057 for discussions as to why we shouldn't enable changing to number of dimensions in an update.

EwanC added a commit to Bensuo/unified-runtime that referenced this issue Feb 22, 2024

Error if number of dimensions changes during update

3ceb6d9

See KhronosGroup/OpenCL-Docs#1057 for discussions as to why we shouldn't enable changing to number of dimensions in an update.

EwanC mentioned this issue Feb 27, 2024

[EXP][OpenCL] Update kernel commands in a command-buffer oneapi-src/unified-runtime#1358

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow changes to work dim in cl_khr_command_buffer_mutable_dispatch #1057

Allow changes to work dim in cl_khr_command_buffer_mutable_dispatch #1057

EwanC commented Feb 9, 2024

bashbaug commented Feb 12, 2024

EwanC commented Feb 14, 2024 •

edited

Loading

bashbaug commented Feb 14, 2024 •

edited

Loading

EwanC commented Feb 15, 2024

EwanC commented Feb 22, 2024

Allow changes to work dim in cl_khr_command_buffer_mutable_dispatch #1057

Allow changes to work dim in cl_khr_command_buffer_mutable_dispatch #1057

Comments

EwanC commented Feb 9, 2024

bashbaug commented Feb 12, 2024

EwanC commented Feb 14, 2024 • edited Loading

bashbaug commented Feb 14, 2024 • edited Loading

EwanC commented Feb 15, 2024

EwanC commented Feb 22, 2024

EwanC commented Feb 14, 2024 •

edited

Loading

bashbaug commented Feb 14, 2024 •

edited

Loading