Skip to content

Commit

Permalink
[SYCL][OpenCL] Enable graph extension on OpenCL backend (intel#11718)
Browse files Browse the repository at this point in the history
intel-llvm CI run for adding Command Buffers to the OpenCL Adapter in
Unified Runtime - oneapi-src/unified-runtime#966

Also completes follow-on work identified in intel#11599 to add an OpenCL
section to the SYCL-Graphs docs and update the e2e Graph tests. Updating
the tests has since been completed in a separate PR -
intel#11877

Depends on intel#11820 merging first.

---------

Co-authored-by: Pablo Reble <pablo@reble.org>
Co-authored-by: Ewan Crawford <ewan@codeplay.com>
Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
  • Loading branch information
4 people authored and callumfare committed Dec 18, 2023
1 parent 96ff073 commit 795871c
Show file tree
Hide file tree
Showing 26 changed files with 243 additions and 36 deletions.
105 changes: 103 additions & 2 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,8 +149,8 @@ yet been implemented.
Implementation of UR command-buffers
for each of the supported SYCL 2020 backends.

Currently Level Zero and CUDA backends are implemented.
More sub-sections will be added here as other backends are supported.
Backends which are implemented currently are: [Level Zero](#level-zero),
[CUDA](#cuda), and partial support for [OpenCL](#opencl).

### Level Zero

Expand Down Expand Up @@ -246,3 +246,104 @@ the executable CUDA Graph that represent this series of operations.
An executable CUDA Graph, which contains all commands and synchronization
information, is saved in the UR command-buffer to allow for efficient
graph resubmission.

### OpenCL

SYCL-Graph is only enabled for an OpenCL backend when the
[cl_khr_command_buffer](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer)
extension is available, however this information isn't available until runtime
due to OpenCL implementations being loaded through an ICD.

The `ur_exp_command_buffer` string is conditionally returned from the OpenCL
command-buffer UR backend at runtime based on `cl_khr_command_buffer` support
to indicate that the graph extension should be enabled. This is information
is propagated to the SYCL user via the
`device.get_info<info::device::graph_support>()` query for graph extension
support.

#### Limitations

Due to the API mapping gaps documented in the following section, OpenCL as a
SYCL backend cannot fully support the graph API. Instead, there are
limitations in the types of nodes which a user can add to a graph, using
an unsupported node type will cause a sycl exception to be thrown in graph
finalization with error code `sycl::errc::feature_not_supported` and a message
mentioning the unsupported command. For example,

```
terminate called after throwing an instance of 'sycl::_V1::exception'
what(): USM copy command not supported by graph backend
```

The types of commands which are unsupported, and lead to this exception are:
* `handler::copy(src, dest)` - Where `src` is an accessor and `dest` is a pointer.
This corresponds to a memory buffer read command.
* `handler::copy(src, dest)` - Where `src` is an pointer and `dest` is an accessor.
This corresponds to a memory buffer write command.
* `handler::copy(src, dest)` or `handler::memcpy(dest, src)` - Where both `src` and
`dest` are USM pointers. This corresponds to a USM copy command.

Note that `handler::copy(src, dest)` where both `src` and `dest` are an accessor
is supported, as a memory buffer copy command exists in the OpenCL extension.

#### UR API Mapping

There are some gaps in both the OpenCL and UR specifications for Command
Buffers shown in the list below. There are implementations in the UR OpenCL
adapter where there is matching support for each function in the list.

| UR | OpenCL | Supported |
| --- | --- | --- |
| urCommandBufferCreateExp | clCreateCommandBufferKHR | Yes |
| urCommandBufferRetainExp | clRetainCommandBufferKHR | Yes |
| urCommandBufferReleaseExp | clReleaseCommandBufferKHR | Yes |
| urCommandBufferFinalizeExp | clFinalizeCommandBufferKHR | Yes |
| urCommandBufferAppendKernelLaunchExp | clCommandNDRangeKernelKHR | Yes |
| urCommandBufferAppendUSMMemcpyExp | | No |
| urCommandBufferAppendUSMFillExp | | No |
| urCommandBufferAppendMembufferCopyExp | clCommandCopyBufferKHR | Yes |
| urCommandBufferAppendMemBufferWriteExp | | No |
| urCommandBufferAppendMemBufferReadExp | | No |
| urCommandBufferAppendMembufferCopyRectExp | clCommandCopyBufferRectKHR | Yes |
| urCommandBufferAppendMemBufferWriteRectExp | | No |
| urCommandBufferAppendMemBufferReadRectExp | | No |
| urCommandBufferAppendMemBufferFillExp | clCommandFillBufferKHR | Yes |
| urCommandBufferEnqueueExp | clEnqueueCommandBufferKHR | Yes |
| | clCommandBarrierWithWaitListKHR | No |
| | clCommandCopyImageKHR | No |
| | clCommandCopyImageToBufferKHR | No |
| | clCommandFillImageKHR | No |
| | clGetCommandBufferInfoKHR | No |
| | clCommandSVMMemcpyKHR | No |
| | clCommandSVMMemFillKHR | No |

We are looking to address these gaps in the future so that SYCL-Graph can be
fully supported on a `cl_khr_command_buffer` backend.

#### UR Command-Buffer Implementation

Many of the OpenCL functions take a `cl_command_queue` parameter which is not
present in most of the UR functions. Instead, when a new command buffer is
created in `urCommandBufferCreateExp` we also create and maintain a new
internal `ur_queue_handle_t` with a reference stored inside of the
`ur_exp_command_buffer_handle_t_` struct. The internal queue is retained and
released whenever the owning command buffer is retained or released.

With command buffers being an OpenCL extension, each function is accessed by
loading a function pointer to its implementation. These are defined in a common
header file in the UR OpenCL adapter. The symbols for the functions are however
defined in [OpenCL-Headers](https://github.com/KhronosGroup/OpenCL-Headers/blob/main/CL/cl_ext.h)
but it is not known at this time what version of the headers will be used in
the UR GitHub CI configuration, so loading the function pointers will be used
until this can be verified. A future piece of work would be replacing the
custom defined symbols with the ones from OpenCL-Headers.

#### Available OpenCL Command-Buffer Implementations

Publicly available implementations of `cl_khr_command_buffer` that can be used
to enable the graph extension in OpenCL:

- [OneAPI Construction Kit](https://github.com/codeplaysoftware/oneapi-construction-kit) (must enable `OCL_EXTENSION_cl_khr_command_buffer` when building)
- [PoCL](http://portablecl.org/)
- [Command-Buffer Emulation Layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/efeae73139ddf064fafce565cc39640af10d900f/layers/10_cmdbufemu)

2 changes: 1 addition & 1 deletion sycl/doc/design/images/SYCL-Graph-Architecture.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 6 additions & 6 deletions sycl/plugins/unified_runtime/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,13 @@ if(SYCL_PI_UR_USE_FETCH_CONTENT)
include(FetchContent)

set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime.git")
# commit 31b654f981f6098936e7f04c65803395a2ea343a
# Merge: 71957e84 3da21336
# commit 109ed46ee55f41317d35b2a9a20fa7a2029e9e64
# Merge: 31b654f9 23005313
# Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
# Date: Wed Nov 22 11:27:33 2023 +0000
# Merge pull request #1053 from jandres742/url0leakkey
# [UR][L0] Add UR_L0_LEAKS_DEBUG key
set(UNIFIED_RUNTIME_TAG 31b654f981f6098936e7f04c65803395a2ea343a)
# Date: Wed Nov 22 16:04:52 2023 +0000
# Merge pull request #966 from martygrant/martin/openclCommandBuffers
# [OpenCL] Add Command Buffer extension to OpenCL adapter.
set(UNIFIED_RUNTIME_TAG 109ed46ee55f41317d35b2a9a20fa7a2029e9e64)

if(SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO)
set(UNIFIED_RUNTIME_REPO "${SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO}")
Expand Down
6 changes: 5 additions & 1 deletion sycl/plugins/unified_runtime/pi2ur.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ static pi_result ur2piResult(ur_result_t urResult) {
return PI_ERROR_LINK_PROGRAM_FAILURE;
case UR_RESULT_ERROR_UNSUPPORTED_VERSION:
case UR_RESULT_ERROR_UNSUPPORTED_FEATURE:
return PI_ERROR_INVALID_OPERATION;
case UR_RESULT_ERROR_INVALID_ARGUMENT:
case UR_RESULT_ERROR_INVALID_NULL_HANDLE:
case UR_RESULT_ERROR_HANDLE_OBJECT_IN_USE:
Expand All @@ -127,7 +128,6 @@ static pi_result ur2piResult(ur_result_t urResult) {
return PI_ERROR_INVALID_WORK_DIMENSION;
case UR_RESULT_ERROR_INVALID_GLOBAL_WIDTH_DIMENSION:
return PI_ERROR_INVALID_VALUE;

case UR_RESULT_ERROR_PROGRAM_UNLINKED:
return PI_ERROR_INVALID_PROGRAM_EXECUTABLE;
case UR_RESULT_ERROR_OVERLAPPING_REGIONS:
Expand All @@ -140,6 +140,10 @@ static pi_result ur2piResult(ur_result_t urResult) {
return PI_ERROR_OUT_OF_RESOURCES;
case UR_RESULT_ERROR_ADAPTER_SPECIFIC:
return PI_ERROR_PLUGIN_SPECIFIC_ERROR;
case UR_RESULT_ERROR_INVALID_COMMAND_BUFFER_EXP:
return PI_ERROR_INVALID_COMMAND_BUFFER_KHR;
case UR_RESULT_ERROR_INVALID_COMMAND_BUFFER_SYNC_POINT_WAIT_LIST_EXP:
return PI_ERROR_INVALID_SYNC_POINT_WAIT_LIST_KHR;
case UR_RESULT_ERROR_UNKNOWN:
default:
return PI_ERROR_UNKNOWN;
Expand Down
89 changes: 68 additions & 21 deletions sycl/source/detail/memory_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1518,10 +1518,20 @@ void MemoryManager::ext_oneapi_copyD2H_cmd_buffer(
}

if (1 == DimDst && 1 == DimSrc) {
Plugin->call<PiApiKind::piextCommandBufferMemBufferRead>(
CommandBuffer, sycl::detail::pi::cast<sycl::detail::pi::PiMem>(SrcMem),
SrcXOffBytes, SrcAccessRangeWidthBytes, DstMem + DstXOffBytes,
Deps.size(), Deps.data(), OutSyncPoint);
pi_result Result =
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemBufferRead>(
CommandBuffer,
sycl::detail::pi::cast<sycl::detail::pi::PiMem>(SrcMem),
SrcXOffBytes, SrcAccessRangeWidthBytes, DstMem + DstXOffBytes,
Deps.size(), Deps.data(), OutSyncPoint);

if (Result == PI_ERROR_INVALID_OPERATION) {
throw sycl::exception(
sycl::make_error_code(sycl::errc::feature_not_supported),
"Device-to-host buffer copy command not supported by graph backend");
} else {
Plugin->checkPiResult(Result);
}
} else {
size_t BufferRowPitch = (1 == DimSrc) ? 0 : SrcSzWidthBytes;
size_t BufferSlicePitch =
Expand All @@ -1538,11 +1548,20 @@ void MemoryManager::ext_oneapi_copyD2H_cmd_buffer(
SrcAccessRange[SrcPos.YTerm],
SrcAccessRange[SrcPos.ZTerm]};

Plugin->call<PiApiKind::piextCommandBufferMemBufferReadRect>(
CommandBuffer, sycl::detail::pi::cast<sycl::detail::pi::PiMem>(SrcMem),
&BufferOffset, &HostOffset, &RectRegion, BufferRowPitch,
BufferSlicePitch, HostRowPitch, HostSlicePitch, DstMem, Deps.size(),
Deps.data(), OutSyncPoint);
pi_result Result =
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemBufferReadRect>(
CommandBuffer,
sycl::detail::pi::cast<sycl::detail::pi::PiMem>(SrcMem),
&BufferOffset, &HostOffset, &RectRegion, BufferRowPitch,
BufferSlicePitch, HostRowPitch, HostSlicePitch, DstMem, Deps.size(),
Deps.data(), OutSyncPoint);
if (Result == PI_ERROR_INVALID_OPERATION) {
throw sycl::exception(
sycl::make_error_code(sycl::errc::feature_not_supported),
"Device-to-host buffer copy command not supported by graph backend");
} else {
Plugin->checkPiResult(Result);
}
}
}

Expand Down Expand Up @@ -1576,10 +1595,20 @@ void MemoryManager::ext_oneapi_copyH2D_cmd_buffer(
}

if (1 == DimDst && 1 == DimSrc) {
Plugin->call<PiApiKind::piextCommandBufferMemBufferWrite>(
CommandBuffer, sycl::detail::pi::cast<sycl::detail::pi::PiMem>(DstMem),
DstXOffBytes, DstAccessRangeWidthBytes, SrcMem + SrcXOffBytes,
Deps.size(), Deps.data(), OutSyncPoint);
pi_result Result =
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemBufferWrite>(
CommandBuffer,
sycl::detail::pi::cast<sycl::detail::pi::PiMem>(DstMem),
DstXOffBytes, DstAccessRangeWidthBytes, SrcMem + SrcXOffBytes,
Deps.size(), Deps.data(), OutSyncPoint);

if (Result == PI_ERROR_INVALID_OPERATION) {
throw sycl::exception(
sycl::make_error_code(sycl::errc::feature_not_supported),
"Host-to-device buffer copy command not supported by graph backend");
} else {
Plugin->checkPiResult(Result);
}
} else {
size_t BufferRowPitch = (1 == DimDst) ? 0 : DstSzWidthBytes;
size_t BufferSlicePitch =
Expand All @@ -1596,11 +1625,21 @@ void MemoryManager::ext_oneapi_copyH2D_cmd_buffer(
DstAccessRange[DstPos.YTerm],
DstAccessRange[DstPos.ZTerm]};

Plugin->call<PiApiKind::piextCommandBufferMemBufferWriteRect>(
CommandBuffer, sycl::detail::pi::cast<sycl::detail::pi::PiMem>(DstMem),
&BufferOffset, &HostOffset, &RectRegion, BufferRowPitch,
BufferSlicePitch, HostRowPitch, HostSlicePitch, SrcMem, Deps.size(),
Deps.data(), OutSyncPoint);
pi_result Result =
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemBufferWriteRect>(
CommandBuffer,
sycl::detail::pi::cast<sycl::detail::pi::PiMem>(DstMem),
&BufferOffset, &HostOffset, &RectRegion, BufferRowPitch,
BufferSlicePitch, HostRowPitch, HostSlicePitch, SrcMem, Deps.size(),
Deps.data(), OutSyncPoint);

if (Result == PI_ERROR_INVALID_OPERATION) {
throw sycl::exception(
sycl::make_error_code(sycl::errc::feature_not_supported),
"Host-to-device buffer copy command not supported by graph backend");
} else {
Plugin->checkPiResult(Result);
}
}
}

Expand All @@ -1614,9 +1653,17 @@ void MemoryManager::ext_oneapi_copy_usm_cmd_buffer(
PI_ERROR_INVALID_VALUE);

const PluginPtr &Plugin = Context->getPlugin();
Plugin->call<PiApiKind::piextCommandBufferMemcpyUSM>(
CommandBuffer, DstMem, SrcMem, Len, Deps.size(), Deps.data(),
OutSyncPoint);
pi_result Result =
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemcpyUSM>(
CommandBuffer, DstMem, SrcMem, Len, Deps.size(), Deps.data(),
OutSyncPoint);
if (Result == PI_ERROR_INVALID_OPERATION) {
throw sycl::exception(
sycl::make_error_code(sycl::errc::feature_not_supported),
"USM copy command not supported by graph backend");
} else {
Plugin->checkPiResult(Result);
}
}

void MemoryManager::copy_image_bindless(
Expand Down
3 changes: 3 additions & 0 deletions sycl/test-e2e/Graph/Explicit/buffer_copy_host2target.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
//
// TODO enable cuda once buffer issue investigated and fixed
// UNSUPPORTED: cuda
//
// Host to device copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_EXPLICIT

Expand Down
3 changes: 3 additions & 0 deletions sycl/test-e2e/Graph/Explicit/buffer_copy_host2target_2d.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
//
// TODO enable cuda once buffer issue investigated and fixed
// UNSUPPORTED: cuda
//
// Host to device copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_EXPLICIT

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
//
// TODO enable cuda once buffer issue investigated and fixed
// UNSUPPORTED: cuda
//
// Host to device copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_EXPLICIT

Expand Down
3 changes: 3 additions & 0 deletions sycl/test-e2e/Graph/Explicit/buffer_copy_target2host.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
//
// TODO enable cuda once buffer issue investigated and fixed
// UNSUPPORTED: cuda
//
// Device to host copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_EXPLICIT

Expand Down
3 changes: 3 additions & 0 deletions sycl/test-e2e/Graph/Explicit/buffer_copy_target2host_2d.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
//
// TODO enable cuda once buffer issue investigated and fixed
// UNSUPPORTED: cuda
//
// Device to host copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_EXPLICIT

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
//
// TODO enable cuda once buffer issue investigated and fixed
// UNSUPPORTED: cuda
//
// Device to host copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_EXPLICIT

Expand Down
8 changes: 8 additions & 0 deletions sycl/test-e2e/Graph/Explicit/cycle_error.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,14 @@ void CreateGraphWithCyclesTest(bool DisableCycleChecks) {
}

int main() {
{
queue Queue;

if (!are_graphs_supported(Queue)) {
return 0;
}
}

// Test with cycle checks
CreateGraphWithCyclesTest(false);
// Test without cycle checks
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/executable_graph_update_ordering"
#include "../Inputs/executable_graph_update_ordering.cpp"
3 changes: 3 additions & 0 deletions sycl/test-e2e/Graph/Explicit/usm_copy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
// RUN: %if ext_oneapi_level_zero %{env UR_L0_LEAKS_DEBUG=1 %{run} %t.out 2>&1 | FileCheck %s %}
//
// CHECK-NOT: LEAK
//
// USM copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_EXPLICIT

Expand Down
1 change: 0 additions & 1 deletion sycl/test-e2e/Graph/Inputs/buffer_ordering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
#include "../graph_common.hpp"

int main() {

queue Queue{{sycl::ext::intel::property::queue::no_immediate_command_list{}}};

const size_t N = 10;
Expand Down
3 changes: 3 additions & 0 deletions sycl/test-e2e/Graph/RecordReplay/buffer_copy_host2target.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
//
// TODO enable cuda once buffer issue investigated and fixed
// UNSUPPORTED: cuda
//
// Host to device copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_RECORD_REPLAY

Expand Down
Loading

0 comments on commit 795871c

Please sign in to comment.