Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EXP][Command-Buffer] Add kernel command update #1089

Merged
merged 1 commit into from
Feb 13, 2024

Conversation

EwanC
Copy link
Contributor

@EwanC EwanC commented Nov 16, 2023

This change introduces a new API that allows the kernel commands of a command-buffer to be updated
with a new configuration. For example, modified arguments or ND-Range.

API

The new API is defined in the following files and then source generated using scripts, so reviewers should look at:

  • scripts/core/EXP-COMMAND-BUFFER.rst
  • scripts/core/exp-command-buffer.yml

See cl_khr_command_buffer_mutable_dispatch as prior art. The differences between the proposed API and the above are:

  • Only the append kernel entry-point returns a command handle. I imagine this will be changed in future to enable other commands to do update.
  • USM, buffer, and scalar arguments can be updated, there is not equivalent update struct for urKernelSetArgLocal or urKernelSetArgSampler
  • There is no granularity of optional support for update, an implementer must either implement all the ways to update a kernel configuration, or none of them.
  • Command-handles are reference counted in UR, and extend the lifetime of the parent command-buffer.

Adapters

The CUDA adapter is the only adapter that currently implements this new feature, other adapters don't report support. This is because CUDA is already an adapter supported by UR command-buffers, and the CUDA API for updating nodes already exists as a non-optional feature.

Reviewers should review the changes in source/adapters/cuda/ to evaluate this,

CTS

CTS tests are written to verify implementation, as there is not yet a DPC++ feature with testing to stress the code path (see reble/llvm#340 for how that feature could look).

A new test directory has been created to test the command-buffer experimental feature, test/conformance/exp_command_buffer, which contains tests to stress using the feature defined by this extension so that it has code coverage. Reviewers should look at the new tests added here, and new device kernels in test/conformance/device_code to evaluate these changes.

scripts/parse_specs.py Outdated Show resolved Hide resolved
@EwanC EwanC force-pushed the ewan/update branch 4 times, most recently from 99c329b to 602e72b Compare December 4, 2023 10:17
@fabiomestre fabiomestre changed the base branch from adapters to main December 5, 2023 16:48
@fabiomestre
Copy link
Contributor

I have updated the target branch of this PR from the adapters branch to the main branch.
Development in UR is moving back to main. The adapters branch will soon be deleted.

@codecov-commenter
Copy link

codecov-commenter commented Dec 19, 2023

Codecov Report

Attention: 1372 lines in your changes are missing coverage. Please review.

Comparison is base (5e548c5) 15.33% compared to head (fbdaf72) 14.82%.

Files Patch % Lines
include/ur_print.hpp 0.00% 349 Missing ⚠️
...e/exp_command_buffer/buffer_fill_kernel_update.cpp 0.00% 140 Missing ⚠️
...ance/exp_command_buffer/usm_fill_kernel_update.cpp 0.00% 140 Missing ⚠️
.../conformance/exp_command_buffer/ndrange_update.cpp 0.00% 101 Missing ⚠️
test/conformance/exp_command_buffer/fixtures.h 0.00% 69 Missing ⚠️
.../exp_command_buffer/buffer_saxpy_kernel_update.cpp 0.00% 68 Missing ⚠️
source/loader/ur_ldrddi.cpp 7.04% 66 Missing ⚠️
...nce/exp_command_buffer/usm_saxpy_kernel_update.cpp 0.00% 64 Missing ⚠️
source/loader/layers/validation/ur_valddi.cpp 13.69% 63 Missing ⚠️
.../conformance/exp_command_buffer/invalid_update.cpp 0.00% 53 Missing ⚠️
... and 8 more

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1089      +/-   ##
==========================================
- Coverage   15.33%   14.82%   -0.52%     
==========================================
  Files         241      250       +9     
  Lines       34821    36220    +1399     
  Branches     3989     4094     +105     
==========================================
+ Hits         5340     5369      +29     
- Misses      29430    30800    +1370     
  Partials       51       51              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@EwanC EwanC force-pushed the ewan/update branch 5 times, most recently from 14a3131 to fc55f7a Compare December 20, 2023 15:42
scripts/core/exp-command-buffer.yml Outdated Show resolved Hide resolved
scripts/core/exp-command-buffer.yml Outdated Show resolved Hide resolved
@EwanC EwanC force-pushed the ewan/update branch 5 times, most recently from 3f5d484 to 0c71fc5 Compare December 21, 2023 15:01
@EwanC EwanC force-pushed the ewan/update branch 4 times, most recently from 2f02ade to 9137082 Compare January 4, 2024 12:49
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Feb 13, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

This depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Feb 13, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

This depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Feb 13, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

This depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).
This change introduces a new API that allows the kernel commands of a command-buffer to be updated
with a new configuration. For example, modified arguments or ND-Range.

The new API is defined in the following files and then source generated using scripts, so reviewers should look at:
* `scripts/core/EXP-COMMAND-BUFFER.rst`
* `scripts/core/exp-command-buffer.yml`

See [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) as prior art. The differences between the proposed API and the above are:

* Only the append kernel entry-point returns a command handle. I imagine this will be changed in future to enable other commands to do update.
* USM,  buffer, and scalar arguments can be updated, there is not equivalent update struct for `urKernelSetArgLocal` or `urKernelSetArgSampler`
* There is no granularity of optional support for update, an implementer must either implement all the ways to update a kernel configuration, or none of them.
* Command-handles are reference counted in UR, and extend the lifetime of the parent command-buffer.

The CUDA adapter is the only adapter that currently implements this new feature, other adapters don't report support. This is because CUDA is already an adapter supported by UR command-buffers, and the CUDA API for updating nodes already exists as a non-optional feature.

Reviewers should review the changes in `source/adapters/cuda/` to evaluate this,

CTS tests are written to verify implementation, as there is not yet a DPC++ feature with testing to stress the code path (see reble/llvm#340 for how that feature could look).

A new test directory has been created to test the command-buffer experimental feature, `test/conformance/exp_command_buffer`, which contains tests to stress using the feature defined by this extension so that it has code coverage. Reviewers should look at the new tests added here, and new device kernels in `test/conformance/device_code` to evaluate these changes.
@kbenzie kbenzie merged commit 8ff738f into oneapi-src:main Feb 13, 2024
52 checks passed
EwanC added a commit to reble/llvm that referenced this pull request Feb 13, 2024
The new
entry-points added are currently unused by SYCL-Graph, but
will be build upon later.

See oneapi-src/unified-runtime#1089
dm-vodopyanov pushed a commit to intel/llvm that referenced this pull request Feb 13, 2024
The new entry-points added by
oneapi-src/unified-runtime#1089 are currently
unused by SYCL-Graph, but will be build upon later in
#12486
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Feb 14, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

This depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Feb 14, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

This depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Feb 14, 2024
Address issues in the CUDA adapter code added by oneapi-src#1089
flagged by Coverity.

* Uninitalized struct member of `CUDA_KERNEL_NODE_PARAMS` struct
* copying instead of moving a shared pointer to a node when
  constructing a command-handle.
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Feb 15, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

This depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Feb 15, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

This depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Feb 19, 2024
Address issues in the CUDA adapter code added by oneapi-src#1089
flagged by Coverity.

* Uninitalized struct member of `CUDA_KERNEL_NODE_PARAMS` struct
* copying instead of moving a shared pointer to a node when
  constructing a command-handle.
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Feb 22, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

This depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Feb 27, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

However, the following changes to the UR kernel update API have been
made based on implementation experience:

1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057
2. Remove struct fields to update exec info, after [DPC++ implementation
   prototype](intel/llvm#12840) shows this isn't
   needed.

This adapter implementation depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU/CPUs OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```

DPC++ PR intel/llvm#12724
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Feb 28, 2024
Address issues in the CUDA adapter code added by oneapi-src#1089
flagged by Coverity.

* Uninitalized struct member of `CUDA_KERNEL_NODE_PARAMS` struct
* copying instead of moving a shared pointer to a node when
  constructing a command-handle.
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Mar 4, 2024
Address issues in the CUDA adapter code added by oneapi-src#1089
flagged by Coverity.

* Uninitalized struct member of `CUDA_KERNEL_NODE_PARAMS` struct
* copying instead of moving a shared pointer to a node when
  constructing a command-handle.
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Mar 14, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

However, the following changes to the UR kernel update API have been
made based on implementation experience:

1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057
2. Remove struct fields to update exec info, after [DPC++ implementation
   prototype](intel/llvm#12840) shows this isn't
   needed.
3. Forbid changing the local work size from user to impl defined and
   vice-versa. See discussion in [L0 implementation
PR](oneapi-src#1353 (comment)).

This adapter implementation depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU/CPUs OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```

DPC++ PR intel/llvm#12724
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Mar 25, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

However, the following changes to the UR kernel update API have been
made based on implementation experience:

1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057
2. Remove struct fields to update exec info, after [DPC++ implementation
   prototype](intel/llvm#12840) shows this isn't
   needed.
3. Forbid changing the local work size from user to impl defined and
   vice-versa. See discussion in [L0 implementation
PR](oneapi-src#1353 (comment)).

This adapter implementation depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU/CPUs OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```

DPC++ PR intel/llvm#12724
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Apr 4, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

However, the following changes to the UR kernel update API have been
made based on implementation experience:

1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057
2. Remove struct fields to update exec info, after [DPC++ implementation
   prototype](intel/llvm#12840) shows this isn't
   needed.
3. Forbid changing the local work size from user to impl defined and
   vice-versa. See discussion in [L0 implementation
PR](oneapi-src#1353 (comment)).

This adapter implementation depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU/CPUs OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```

DPC++ PR intel/llvm#12724
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Apr 5, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

However, the following changes to the UR kernel update API have been
made based on implementation experience:

1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057
2. Remove struct fields to update exec info, after [DPC++ implementation
   prototype](intel/llvm#12840) shows this isn't
   needed.
3. Forbid changing the local work size from user to impl defined and
   vice-versa. See discussion in [L0 implementation
PR](oneapi-src#1353 (comment)).

This adapter implementation depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU/CPUs OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```

DPC++ PR intel/llvm#12724
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Apr 22, 2024
Implement the API for updating the kernel commands in a command-buffer
defined by oneapi-src#1089 for
the OpenCL adapter.

However, the following changes to the UR kernel update API have been
made based on implementation experience:

1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057
2. Remove struct fields to update exec info, after [DPC++ implementation
   prototype](intel/llvm#12840) shows this isn't
   needed.
3. Forbid changing the local work size from user to impl defined and
   vice-versa. See discussion in [L0 implementation
PR](oneapi-src#1353 (comment)).

This adapter implementation depends on support for the
[cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch)
extension.

Tested on Intel GPU/CPUs OpenCL implementations with the
[command-buffer emulation
layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).

```bash
$ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics"
```

DPC++ PR intel/llvm#12724
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to merge Added to PR's which are ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.