-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EXP][Command-Buffer] Add kernel command update #1089
Conversation
99c329b
to
602e72b
Compare
I have updated the target branch of this PR from the |
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1089 +/- ##
==========================================
- Coverage 15.33% 14.82% -0.52%
==========================================
Files 241 250 +9
Lines 34821 36220 +1399
Branches 3989 4094 +105
==========================================
+ Hits 5340 5369 +29
- Misses 29430 30800 +1370
Partials 51 51 ☔ View full report in Codecov by Sentry. |
14a3131
to
fc55f7a
Compare
3f5d484
to
0c71fc5
Compare
2f02ade
to
9137082
Compare
Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. This depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension.
Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. This depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).
Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. This depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu).
This change introduces a new API that allows the kernel commands of a command-buffer to be updated with a new configuration. For example, modified arguments or ND-Range. The new API is defined in the following files and then source generated using scripts, so reviewers should look at: * `scripts/core/EXP-COMMAND-BUFFER.rst` * `scripts/core/exp-command-buffer.yml` See [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) as prior art. The differences between the proposed API and the above are: * Only the append kernel entry-point returns a command handle. I imagine this will be changed in future to enable other commands to do update. * USM, buffer, and scalar arguments can be updated, there is not equivalent update struct for `urKernelSetArgLocal` or `urKernelSetArgSampler` * There is no granularity of optional support for update, an implementer must either implement all the ways to update a kernel configuration, or none of them. * Command-handles are reference counted in UR, and extend the lifetime of the parent command-buffer. The CUDA adapter is the only adapter that currently implements this new feature, other adapters don't report support. This is because CUDA is already an adapter supported by UR command-buffers, and the CUDA API for updating nodes already exists as a non-optional feature. Reviewers should review the changes in `source/adapters/cuda/` to evaluate this, CTS tests are written to verify implementation, as there is not yet a DPC++ feature with testing to stress the code path (see reble/llvm#340 for how that feature could look). A new test directory has been created to test the command-buffer experimental feature, `test/conformance/exp_command_buffer`, which contains tests to stress using the feature defined by this extension so that it has code coverage. Reviewers should look at the new tests added here, and new device kernels in `test/conformance/device_code` to evaluate these changes.
The new entry-points added are currently unused by SYCL-Graph, but will be build upon later. See oneapi-src/unified-runtime#1089
The new entry-points added by oneapi-src/unified-runtime#1089 are currently unused by SYCL-Graph, but will be build upon later in #12486
Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. This depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu). ```bash $ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics" ```
Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. This depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu). ```bash $ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics" ```
Address issues in the CUDA adapter code added by oneapi-src#1089 flagged by Coverity. * Uninitalized struct member of `CUDA_KERNEL_NODE_PARAMS` struct * copying instead of moving a shared pointer to a node when constructing a command-handle.
Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. This depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu). ```bash $ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics" ```
Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. This depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu). ```bash $ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics" ```
Address issues in the CUDA adapter code added by oneapi-src#1089 flagged by Coverity. * Uninitalized struct member of `CUDA_KERNEL_NODE_PARAMS` struct * copying instead of moving a shared pointer to a node when constructing a command-handle.
Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. This depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu). ```bash $ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics" ```
Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. However, the following changes to the UR kernel update API have been made based on implementation experience: 1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057 2. Remove struct fields to update exec info, after [DPC++ implementation prototype](intel/llvm#12840) shows this isn't needed. This adapter implementation depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU/CPUs OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu). ```bash $ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics" ``` DPC++ PR intel/llvm#12724
Address issues in the CUDA adapter code added by oneapi-src#1089 flagged by Coverity. * Uninitalized struct member of `CUDA_KERNEL_NODE_PARAMS` struct * copying instead of moving a shared pointer to a node when constructing a command-handle.
Address issues in the CUDA adapter code added by oneapi-src#1089 flagged by Coverity. * Uninitalized struct member of `CUDA_KERNEL_NODE_PARAMS` struct * copying instead of moving a shared pointer to a node when constructing a command-handle.
Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. However, the following changes to the UR kernel update API have been made based on implementation experience: 1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057 2. Remove struct fields to update exec info, after [DPC++ implementation prototype](intel/llvm#12840) shows this isn't needed. 3. Forbid changing the local work size from user to impl defined and vice-versa. See discussion in [L0 implementation PR](oneapi-src#1353 (comment)). This adapter implementation depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU/CPUs OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu). ```bash $ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics" ``` DPC++ PR intel/llvm#12724
Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. However, the following changes to the UR kernel update API have been made based on implementation experience: 1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057 2. Remove struct fields to update exec info, after [DPC++ implementation prototype](intel/llvm#12840) shows this isn't needed. 3. Forbid changing the local work size from user to impl defined and vice-versa. See discussion in [L0 implementation PR](oneapi-src#1353 (comment)). This adapter implementation depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU/CPUs OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu). ```bash $ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics" ``` DPC++ PR intel/llvm#12724
Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. However, the following changes to the UR kernel update API have been made based on implementation experience: 1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057 2. Remove struct fields to update exec info, after [DPC++ implementation prototype](intel/llvm#12840) shows this isn't needed. 3. Forbid changing the local work size from user to impl defined and vice-versa. See discussion in [L0 implementation PR](oneapi-src#1353 (comment)). This adapter implementation depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU/CPUs OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu). ```bash $ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics" ``` DPC++ PR intel/llvm#12724
Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. However, the following changes to the UR kernel update API have been made based on implementation experience: 1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057 2. Remove struct fields to update exec info, after [DPC++ implementation prototype](intel/llvm#12840) shows this isn't needed. 3. Forbid changing the local work size from user to impl defined and vice-versa. See discussion in [L0 implementation PR](oneapi-src#1353 (comment)). This adapter implementation depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU/CPUs OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu). ```bash $ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics" ``` DPC++ PR intel/llvm#12724
Implement the API for updating the kernel commands in a command-buffer defined by oneapi-src#1089 for the OpenCL adapter. However, the following changes to the UR kernel update API have been made based on implementation experience: 1. Forbid updating the work-dim of the kernel, see KhronosGroup/OpenCL-Docs#1057 2. Remove struct fields to update exec info, after [DPC++ implementation prototype](intel/llvm#12840) shows this isn't needed. 3. Forbid changing the local work size from user to impl defined and vice-versa. See discussion in [L0 implementation PR](oneapi-src#1353 (comment)). This adapter implementation depends on support for the [cl_khr_command_buffer_mutable_dispatch](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer_mutable_dispatch) extension. Tested on Intel GPU/CPUs OpenCL implementations with the [command-buffer emulation layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/main/layers/10_cmdbufemu). ```bash $ OPENCL_LAYERS=<path/to/SimpleOpenCLSamples/build/layers/10_cmdbufemu/libCmdBufEmu.so> ./bin/test-exp_command_buffer --platform="Intel(R) OpenCL Graphics" ``` DPC++ PR intel/llvm#12724
This change introduces a new API that allows the kernel commands of a command-buffer to be updated
with a new configuration. For example, modified arguments or ND-Range.
API
The new API is defined in the following files and then source generated using scripts, so reviewers should look at:
scripts/core/EXP-COMMAND-BUFFER.rst
scripts/core/exp-command-buffer.yml
See cl_khr_command_buffer_mutable_dispatch as prior art. The differences between the proposed API and the above are:
urKernelSetArgLocal
orurKernelSetArgSampler
Adapters
The CUDA adapter is the only adapter that currently implements this new feature, other adapters don't report support. This is because CUDA is already an adapter supported by UR command-buffers, and the CUDA API for updating nodes already exists as a non-optional feature.
Reviewers should review the changes in
source/adapters/cuda/
to evaluate this,CTS
CTS tests are written to verify implementation, as there is not yet a DPC++ feature with testing to stress the code path (see reble/llvm#340 for how that feature could look).
A new test directory has been created to test the command-buffer experimental feature,
test/conformance/exp_command_buffer
, which contains tests to stress using the feature defined by this extension so that it has code coverage. Reviewers should look at the new tests added here, and new device kernels intest/conformance/device_code
to evaluate these changes.