Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow changes to work dim in cl_khr_command_buffer_mutable_dispatch #1057

Closed
EwanC opened this issue Feb 9, 2024 · 5 comments
Closed

Allow changes to work dim in cl_khr_command_buffer_mutable_dispatch #1057

EwanC opened this issue Feb 9, 2024 · 5 comments
Labels
cl_khr_command_buffer Relating to the command-buffer family of extension

Comments

@EwanC
Copy link
Contributor

EwanC commented Feb 9, 2024

Feature request to allow modifications to the work_dim struct member of cl_mutable_dispatch_config_khr such that the work dimension can change from that originally used in the kernel. This is forbidden in the current specification with error wording in clUpdateMutableCommandsKHR:

CL_INVALID_OPERATION if the work_dim is different from the work_dim set on command recording.

This is motivated by the fact that the equivalent CUDA-Graph, HIP-Graph, Level Zero APIs for updating kernel commands enable modifying the work dimension. Therefore in a SYCL API which lets the user change the dimensions in a kernel, this is a gap compared to other backends.

@EwanC EwanC added the cl_khr_command_buffer Relating to the command-buffer family of extension label Feb 9, 2024
@bashbaug
Copy link
Contributor

What would the behavior be for an existing global work size, local work size, or global work offset if the user modifies the work_dim? For example, if a user modifies the work_dim would they also need to modify the global work size, and potentially the local work size and/or the global work offset?

FWIW, I may just be missing it, but it looks to me like the other APIs that allow updating kernel comments unconditionally use a three-dimensional dispatch. Are we sure they allow changing the dispatch dimensionality?

@EwanC
Copy link
Contributor Author

EwanC commented Feb 14, 2024

FWIW, I may just be missing it, but it looks to me like the other APIs that allow updating kernel comments unconditionally use a three-dimensional dispatch. Are we sure they allow changing the dispatch dimensionality?

I've only tested with CUDA so far out of all those APIs, but going from a 3D dimensional dispatch where each dimension is has >1 work-items, and then updating to a dispatch with <X,Y,1> & <X,1,1> work-items does work. I've not tested the other way going from <X,1,1> original kernel to <X,Y,Z>, i.e. 1D to 3D update but I'll double check that.

What would the behavior be for an existing global work size, local work size, or global work offset if the user modifies the work_dim? For example, if a user modifies the work_dim would they also need to modify the global work size, and potentially the local work size and/or the global work offset?

I guess I'm primarily looking at this from a SYCL backend point of view, and in that SYCL update API I imagine the user would pass a new sycl::nd_range N dimensional index space. As part of that object we'd have the global work-size, local work-size, and global offset. So if we restricted changing the work dimension to only be possible when setting those configurations I think that would be a very reasonable constraint.

@bashbaug
Copy link
Contributor

bashbaug commented Feb 14, 2024

I've only tested with CUDA so far out of all those APIs, but going from a 3D dimensional dispatch where each dimension is has >1 work-items, and then updating to a dispatch with <X,Y,1> & <X,1,1> work-items does work. I've not tested the other way going from <X,1,1> original kernel to <X,Y,Z>, i.e. 1D to 3D update but I'll double check that.

This feels a little philosophical, but debatably going from a dispatch where each dimension <X,Y,Z> has more than one work-item to an <X,Y,1> or <X,1,1> dispatch is still a 3D dispatch, it's just that one of the dispatch dimensions happens to be one. You would really need to go to a <X,Y> or an <X> dispatch to go to a 2D or 1D dispatch.

From an OpenCL kernel perspective I don't think you would be able to observe the difference except for the get_work_dim() built-in function or the SPIR-V WorkDim built-in, but SYCL doesn't seem to have a similar built-in (or I'm missing it). There is a dimensionality encoded into the item / nd_item / etc. classes, but this seems like something different.

Since a SYCL kernel is written to accept an item / nd_item / etc. with a specific dimensionality, and the dimensionality of the range passed to parallel_for must match, does it really make sense to allow for modifying the dispatch dimensionality with the SYCL update API?

@EwanC
Copy link
Contributor Author

EwanC commented Feb 15, 2024

This feels a little philosophical, but debatably going from a dispatch where each dimension <X,Y,Z> has more than one work-item to an <X,Y,1> or <X,1,1> dispatch is still a 3D dispatch, it's just that one of the dispatch dimensions happens to be one. You would really need to go to a <X,Y> or an dispatch to go to a 2D or 1D dispatch.

Yeah, a user could currently create a 3-dimensional kernel command and do the same technique of setting the unwanted dimensions to 1, and then update them later. Therefore an OpenCL backend to SYCL could also use the same trick.

From an OpenCL kernel perspective I don't think you would be able to observe the difference except for the get_work_dim() built-in function or the SPIR-V WorkDim built-in, but SYCL doesn't seem to have a similar built-in (or I'm missing it). There is a dimensionality encoded into the item / nd_item / etc. classes, but this seems like something different.

Since a SYCL kernel is written to accept an item / nd_item / etc. with a specific dimensionality, and the dimensionality of the range passed to parallel_for must match, does it really make sense to allow for modifying the dispatch dimensionality with the SYCL update API?

This is the SYCL kernel I was using to test this https://github.com/oneapi-src/unified-runtime/blob/main/test/conformance/device_code/indexers_usm.cpp and indeed it doesn't use a get_work_dim() , it just checks the global/local ids. I'll try to get information about the SYCL workloads to verify if updating dimension is really a required feature or not.

@EwanC
Copy link
Contributor Author

EwanC commented Feb 22, 2024

I'm going to close this issue because I think a user could currently create a 3D kernel then update the unused dimensions to 1 if they need this functionality, but will reopen the issue if a concrete use case does come up that this technique doesn't fully satisfy.

@EwanC EwanC closed this as completed Feb 22, 2024
EwanC added a commit to Bensuo/unified-runtime that referenced this issue Feb 22, 2024
See KhronosGroup/OpenCL-Docs#1057
for discussions as to why we shouldn't enable changing to
number of dimensions in an update.
EwanC added a commit to Bensuo/unified-runtime that referenced this issue Feb 22, 2024
See KhronosGroup/OpenCL-Docs#1057
for discussions as to why we shouldn't enable changing to
number of dimensions in an update.
EwanC added a commit to Bensuo/unified-runtime that referenced this issue Feb 27, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

However, the following changes to the UR kernel update API have been
made based on implementation experience:

1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057
2. Remove struct fields to update exec info, after [DPC++ implementation
   prototype](intel/llvm#12840) shows this isn't
   needed.

This adapter implementation depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU/CPUs OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```

DPC++ PR intel/llvm#12724
EwanC added a commit to Bensuo/unified-runtime that referenced this issue Mar 14, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

However, the following changes to the UR kernel update API have been
made based on implementation experience:

1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057
2. Remove struct fields to update exec info, after [DPC++ implementation
   prototype](intel/llvm#12840) shows this isn't
   needed.
3. Forbid changing the local work size from user to impl defined and
   vice-versa. See discussion in [L0 implementation
PR](oneapi-src#1353 (comment)).

This adapter implementation depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU/CPUs OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```

DPC++ PR intel/llvm#12724
EwanC added a commit to Bensuo/unified-runtime that referenced this issue Mar 25, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

However, the following changes to the UR kernel update API have been
made based on implementation experience:

1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057
2. Remove struct fields to update exec info, after [DPC++ implementation
   prototype](intel/llvm#12840) shows this isn't
   needed.
3. Forbid changing the local work size from user to impl defined and
   vice-versa. See discussion in [L0 implementation
PR](oneapi-src#1353 (comment)).

This adapter implementation depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU/CPUs OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```

DPC++ PR intel/llvm#12724
EwanC added a commit to Bensuo/unified-runtime that referenced this issue Apr 4, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

However, the following changes to the UR kernel update API have been
made based on implementation experience:

1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057
2. Remove struct fields to update exec info, after [DPC++ implementation
   prototype](intel/llvm#12840) shows this isn't
   needed.
3. Forbid changing the local work size from user to impl defined and
   vice-versa. See discussion in [L0 implementation
PR](oneapi-src#1353 (comment)).

This adapter implementation depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU/CPUs OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```

DPC++ PR intel/llvm#12724
EwanC added a commit to Bensuo/unified-runtime that referenced this issue Apr 5, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

However, the following changes to the UR kernel update API have been
made based on implementation experience:

1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057
2. Remove struct fields to update exec info, after [DPC++ implementation
   prototype](intel/llvm#12840) shows this isn't
   needed.
3. Forbid changing the local work size from user to impl defined and
   vice-versa. See discussion in [L0 implementation
PR](oneapi-src#1353 (comment)).

This adapter implementation depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU/CPUs OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```

DPC++ PR intel/llvm#12724
EwanC added a commit to Bensuo/unified-runtime that referenced this issue Apr 22, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

However, the following changes to the UR kernel update API have been
made based on implementation experience:

1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057
2. Remove struct fields to update exec info, after [DPC++ implementation
   prototype](intel/llvm#12840) shows this isn't
   needed.
3. Forbid changing the local work size from user to impl defined and
   vice-versa. See discussion in [L0 implementation
PR](oneapi-src#1353 (comment)).

This adapter implementation depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU/CPUs OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```

DPC++ PR intel/llvm#12724
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cl_khr_command_buffer Relating to the command-buffer family of extension
Projects
None yet
Development

No branches or pull requests

2 participants