Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][OpenCL] Enable graph extension on OpenCL backend #11718

Merged
merged 6 commits into from
Nov 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 103 additions & 2 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,8 +149,8 @@ yet been implemented.
Implementation of UR command-buffers
for each of the supported SYCL 2020 backends.

Currently Level Zero and CUDA backends are implemented.
More sub-sections will be added here as other backends are supported.
Backends which are implemented currently are: [Level Zero](#level-zero),
[CUDA](#cuda), and partial support for [OpenCL](#opencl).

### Level Zero

Expand Down Expand Up @@ -246,3 +246,104 @@ the executable CUDA Graph that represent this series of operations.
An executable CUDA Graph, which contains all commands and synchronization
information, is saved in the UR command-buffer to allow for efficient
graph resubmission.

### OpenCL

SYCL-Graph is only enabled for an OpenCL backend when the
[cl_khr_command_buffer](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_command_buffer)
extension is available, however this information isn't available until runtime
due to OpenCL implementations being loaded through an ICD.

The `ur_exp_command_buffer` string is conditionally returned from the OpenCL
command-buffer UR backend at runtime based on `cl_khr_command_buffer` support
to indicate that the graph extension should be enabled. This is information
is propagated to the SYCL user via the
`device.get_info<info::device::graph_support>()` query for graph extension
support.

#### Limitations

Due to the API mapping gaps documented in the following section, OpenCL as a
SYCL backend cannot fully support the graph API. Instead, there are
limitations in the types of nodes which a user can add to a graph, using
an unsupported node type will cause a sycl exception to be thrown in graph
finalization with error code `sycl::errc::feature_not_supported` and a message
mentioning the unsupported command. For example,

```
terminate called after throwing an instance of 'sycl::_V1::exception'
what(): USM copy command not supported by graph backend
```

The types of commands which are unsupported, and lead to this exception are:
* `handler::copy(src, dest)` - Where `src` is an accessor and `dest` is a pointer.
This corresponds to a memory buffer read command.
* `handler::copy(src, dest)` - Where `src` is an pointer and `dest` is an accessor.
This corresponds to a memory buffer write command.
* `handler::copy(src, dest)` or `handler::memcpy(dest, src)` - Where both `src` and
`dest` are USM pointers. This corresponds to a USM copy command.

Note that `handler::copy(src, dest)` where both `src` and `dest` are an accessor
is supported, as a memory buffer copy command exists in the OpenCL extension.

#### UR API Mapping

There are some gaps in both the OpenCL and UR specifications for Command
Buffers shown in the list below. There are implementations in the UR OpenCL
adapter where there is matching support for each function in the list.

| UR | OpenCL | Supported |
| --- | --- | --- |
| urCommandBufferCreateExp | clCreateCommandBufferKHR | Yes |
| urCommandBufferRetainExp | clRetainCommandBufferKHR | Yes |
| urCommandBufferReleaseExp | clReleaseCommandBufferKHR | Yes |
| urCommandBufferFinalizeExp | clFinalizeCommandBufferKHR | Yes |
| urCommandBufferAppendKernelLaunchExp | clCommandNDRangeKernelKHR | Yes |
| urCommandBufferAppendUSMMemcpyExp | | No |
| urCommandBufferAppendUSMFillExp | | No |
| urCommandBufferAppendMembufferCopyExp | clCommandCopyBufferKHR | Yes |
| urCommandBufferAppendMemBufferWriteExp | | No |
| urCommandBufferAppendMemBufferReadExp | | No |
| urCommandBufferAppendMembufferCopyRectExp | clCommandCopyBufferRectKHR | Yes |
| urCommandBufferAppendMemBufferWriteRectExp | | No |
| urCommandBufferAppendMemBufferReadRectExp | | No |
| urCommandBufferAppendMemBufferFillExp | clCommandFillBufferKHR | Yes |
| urCommandBufferEnqueueExp | clEnqueueCommandBufferKHR | Yes |
| | clCommandBarrierWithWaitListKHR | No |
| | clCommandCopyImageKHR | No |
| | clCommandCopyImageToBufferKHR | No |
| | clCommandFillImageKHR | No |
| | clGetCommandBufferInfoKHR | No |
martygrant marked this conversation as resolved.
Show resolved Hide resolved
| | clCommandSVMMemcpyKHR | No |
| | clCommandSVMMemFillKHR | No |

We are looking to address these gaps in the future so that SYCL-Graph can be
fully supported on a `cl_khr_command_buffer` backend.

#### UR Command-Buffer Implementation

EwanC marked this conversation as resolved.
Show resolved Hide resolved
Many of the OpenCL functions take a `cl_command_queue` parameter which is not
present in most of the UR functions. Instead, when a new command buffer is
created in `urCommandBufferCreateExp` we also create and maintain a new
internal `ur_queue_handle_t` with a reference stored inside of the
`ur_exp_command_buffer_handle_t_` struct. The internal queue is retained and
released whenever the owning command buffer is retained or released.

With command buffers being an OpenCL extension, each function is accessed by
loading a function pointer to its implementation. These are defined in a common
header file in the UR OpenCL adapter. The symbols for the functions are however
defined in [OpenCL-Headers](https://github.com/KhronosGroup/OpenCL-Headers/blob/main/CL/cl_ext.h)
but it is not known at this time what version of the headers will be used in
the UR GitHub CI configuration, so loading the function pointers will be used
until this can be verified. A future piece of work would be replacing the
custom defined symbols with the ones from OpenCL-Headers.

#### Available OpenCL Command-Buffer Implementations

martygrant marked this conversation as resolved.
Show resolved Hide resolved
Publicly available implementations of `cl_khr_command_buffer` that can be used
to enable the graph extension in OpenCL:

- [OneAPI Construction Kit](https://github.com/codeplaysoftware/oneapi-construction-kit) (must enable `OCL_EXTENSION_cl_khr_command_buffer` when building)
- [PoCL](http://portablecl.org/)
- [Command-Buffer Emulation Layer](https://github.com/bashbaug/SimpleOpenCLSamples/tree/efeae73139ddf064fafce565cc39640af10d900f/layers/10_cmdbufemu)

2 changes: 1 addition & 1 deletion sycl/doc/design/images/SYCL-Graph-Architecture.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 6 additions & 6 deletions sycl/plugins/unified_runtime/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,13 @@ if(SYCL_PI_UR_USE_FETCH_CONTENT)
include(FetchContent)

set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime.git")
# commit 31b654f981f6098936e7f04c65803395a2ea343a
# Merge: 71957e84 3da21336
# commit 109ed46ee55f41317d35b2a9a20fa7a2029e9e64
# Merge: 31b654f9 23005313
# Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
# Date: Wed Nov 22 11:27:33 2023 +0000
# Merge pull request #1053 from jandres742/url0leakkey
# [UR][L0] Add UR_L0_LEAKS_DEBUG key
set(UNIFIED_RUNTIME_TAG 31b654f981f6098936e7f04c65803395a2ea343a)
# Date: Wed Nov 22 16:04:52 2023 +0000
# Merge pull request #966 from martygrant/martin/openclCommandBuffers
# [OpenCL] Add Command Buffer extension to OpenCL adapter.
set(UNIFIED_RUNTIME_TAG 109ed46ee55f41317d35b2a9a20fa7a2029e9e64)

if(SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO)
set(UNIFIED_RUNTIME_REPO "${SYCL_PI_UR_OVERRIDE_FETCH_CONTENT_REPO}")
Expand Down
6 changes: 5 additions & 1 deletion sycl/plugins/unified_runtime/pi2ur.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ static pi_result ur2piResult(ur_result_t urResult) {
return PI_ERROR_LINK_PROGRAM_FAILURE;
case UR_RESULT_ERROR_UNSUPPORTED_VERSION:
case UR_RESULT_ERROR_UNSUPPORTED_FEATURE:
return PI_ERROR_INVALID_OPERATION;
case UR_RESULT_ERROR_INVALID_ARGUMENT:
case UR_RESULT_ERROR_INVALID_NULL_HANDLE:
case UR_RESULT_ERROR_HANDLE_OBJECT_IN_USE:
Expand All @@ -127,7 +128,6 @@ static pi_result ur2piResult(ur_result_t urResult) {
return PI_ERROR_INVALID_WORK_DIMENSION;
case UR_RESULT_ERROR_INVALID_GLOBAL_WIDTH_DIMENSION:
return PI_ERROR_INVALID_VALUE;

case UR_RESULT_ERROR_PROGRAM_UNLINKED:
return PI_ERROR_INVALID_PROGRAM_EXECUTABLE;
case UR_RESULT_ERROR_OVERLAPPING_REGIONS:
Expand All @@ -140,6 +140,10 @@ static pi_result ur2piResult(ur_result_t urResult) {
return PI_ERROR_OUT_OF_RESOURCES;
case UR_RESULT_ERROR_ADAPTER_SPECIFIC:
return PI_ERROR_PLUGIN_SPECIFIC_ERROR;
case UR_RESULT_ERROR_INVALID_COMMAND_BUFFER_EXP:
return PI_ERROR_INVALID_COMMAND_BUFFER_KHR;
case UR_RESULT_ERROR_INVALID_COMMAND_BUFFER_SYNC_POINT_WAIT_LIST_EXP:
return PI_ERROR_INVALID_SYNC_POINT_WAIT_LIST_KHR;
case UR_RESULT_ERROR_UNKNOWN:
default:
return PI_ERROR_UNKNOWN;
Expand Down
89 changes: 68 additions & 21 deletions sycl/source/detail/memory_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1518,10 +1518,20 @@ void MemoryManager::ext_oneapi_copyD2H_cmd_buffer(
}

if (1 == DimDst && 1 == DimSrc) {
Plugin->call<PiApiKind::piextCommandBufferMemBufferRead>(
CommandBuffer, sycl::detail::pi::cast<sycl::detail::pi::PiMem>(SrcMem),
SrcXOffBytes, SrcAccessRangeWidthBytes, DstMem + DstXOffBytes,
Deps.size(), Deps.data(), OutSyncPoint);
pi_result Result =
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemBufferRead>(
CommandBuffer,
sycl::detail::pi::cast<sycl::detail::pi::PiMem>(SrcMem),
SrcXOffBytes, SrcAccessRangeWidthBytes, DstMem + DstXOffBytes,
Deps.size(), Deps.data(), OutSyncPoint);

if (Result == PI_ERROR_INVALID_OPERATION) {
throw sycl::exception(
sycl::make_error_code(sycl::errc::feature_not_supported),
"Device-to-host buffer copy command not supported by graph backend");
} else {
Plugin->checkPiResult(Result);
}
} else {
size_t BufferRowPitch = (1 == DimSrc) ? 0 : SrcSzWidthBytes;
size_t BufferSlicePitch =
Expand All @@ -1538,11 +1548,20 @@ void MemoryManager::ext_oneapi_copyD2H_cmd_buffer(
SrcAccessRange[SrcPos.YTerm],
SrcAccessRange[SrcPos.ZTerm]};

Plugin->call<PiApiKind::piextCommandBufferMemBufferReadRect>(
CommandBuffer, sycl::detail::pi::cast<sycl::detail::pi::PiMem>(SrcMem),
&BufferOffset, &HostOffset, &RectRegion, BufferRowPitch,
BufferSlicePitch, HostRowPitch, HostSlicePitch, DstMem, Deps.size(),
Deps.data(), OutSyncPoint);
pi_result Result =
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemBufferReadRect>(
CommandBuffer,
sycl::detail::pi::cast<sycl::detail::pi::PiMem>(SrcMem),
&BufferOffset, &HostOffset, &RectRegion, BufferRowPitch,
BufferSlicePitch, HostRowPitch, HostSlicePitch, DstMem, Deps.size(),
Deps.data(), OutSyncPoint);
if (Result == PI_ERROR_INVALID_OPERATION) {
throw sycl::exception(
sycl::make_error_code(sycl::errc::feature_not_supported),
"Device-to-host buffer copy command not supported by graph backend");
} else {
Plugin->checkPiResult(Result);
}
}
}

Expand Down Expand Up @@ -1576,10 +1595,20 @@ void MemoryManager::ext_oneapi_copyH2D_cmd_buffer(
}

if (1 == DimDst && 1 == DimSrc) {
Plugin->call<PiApiKind::piextCommandBufferMemBufferWrite>(
CommandBuffer, sycl::detail::pi::cast<sycl::detail::pi::PiMem>(DstMem),
DstXOffBytes, DstAccessRangeWidthBytes, SrcMem + SrcXOffBytes,
Deps.size(), Deps.data(), OutSyncPoint);
pi_result Result =
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemBufferWrite>(
CommandBuffer,
sycl::detail::pi::cast<sycl::detail::pi::PiMem>(DstMem),
DstXOffBytes, DstAccessRangeWidthBytes, SrcMem + SrcXOffBytes,
Deps.size(), Deps.data(), OutSyncPoint);

if (Result == PI_ERROR_INVALID_OPERATION) {
throw sycl::exception(
sycl::make_error_code(sycl::errc::feature_not_supported),
"Host-to-device buffer copy command not supported by graph backend");
} else {
Plugin->checkPiResult(Result);
}
} else {
size_t BufferRowPitch = (1 == DimDst) ? 0 : DstSzWidthBytes;
size_t BufferSlicePitch =
Expand All @@ -1596,11 +1625,21 @@ void MemoryManager::ext_oneapi_copyH2D_cmd_buffer(
DstAccessRange[DstPos.YTerm],
DstAccessRange[DstPos.ZTerm]};

Plugin->call<PiApiKind::piextCommandBufferMemBufferWriteRect>(
CommandBuffer, sycl::detail::pi::cast<sycl::detail::pi::PiMem>(DstMem),
&BufferOffset, &HostOffset, &RectRegion, BufferRowPitch,
BufferSlicePitch, HostRowPitch, HostSlicePitch, SrcMem, Deps.size(),
Deps.data(), OutSyncPoint);
pi_result Result =
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemBufferWriteRect>(
CommandBuffer,
sycl::detail::pi::cast<sycl::detail::pi::PiMem>(DstMem),
&BufferOffset, &HostOffset, &RectRegion, BufferRowPitch,
BufferSlicePitch, HostRowPitch, HostSlicePitch, SrcMem, Deps.size(),
Deps.data(), OutSyncPoint);

if (Result == PI_ERROR_INVALID_OPERATION) {
throw sycl::exception(
sycl::make_error_code(sycl::errc::feature_not_supported),
"Host-to-device buffer copy command not supported by graph backend");
} else {
Plugin->checkPiResult(Result);
}
}
}

Expand All @@ -1614,9 +1653,17 @@ void MemoryManager::ext_oneapi_copy_usm_cmd_buffer(
PI_ERROR_INVALID_VALUE);

const PluginPtr &Plugin = Context->getPlugin();
Plugin->call<PiApiKind::piextCommandBufferMemcpyUSM>(
CommandBuffer, DstMem, SrcMem, Len, Deps.size(), Deps.data(),
OutSyncPoint);
pi_result Result =
Plugin->call_nocheck<PiApiKind::piextCommandBufferMemcpyUSM>(
CommandBuffer, DstMem, SrcMem, Len, Deps.size(), Deps.data(),
OutSyncPoint);
if (Result == PI_ERROR_INVALID_OPERATION) {
throw sycl::exception(
sycl::make_error_code(sycl::errc::feature_not_supported),
"USM copy command not supported by graph backend");
} else {
Plugin->checkPiResult(Result);
}
}

void MemoryManager::copy_image_bindless(
Expand Down
3 changes: 3 additions & 0 deletions sycl/test-e2e/Graph/Explicit/buffer_copy_host2target.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
//
// TODO enable cuda once buffer issue investigated and fixed
// UNSUPPORTED: cuda
//
// Host to device copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_EXPLICIT

Expand Down
3 changes: 3 additions & 0 deletions sycl/test-e2e/Graph/Explicit/buffer_copy_host2target_2d.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
//
// TODO enable cuda once buffer issue investigated and fixed
// UNSUPPORTED: cuda
//
// Host to device copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_EXPLICIT

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
//
// TODO enable cuda once buffer issue investigated and fixed
// UNSUPPORTED: cuda
//
// Host to device copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_EXPLICIT

Expand Down
3 changes: 3 additions & 0 deletions sycl/test-e2e/Graph/Explicit/buffer_copy_target2host.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
//
// TODO enable cuda once buffer issue investigated and fixed
// UNSUPPORTED: cuda
//
// Device to host copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_EXPLICIT

Expand Down
3 changes: 3 additions & 0 deletions sycl/test-e2e/Graph/Explicit/buffer_copy_target2host_2d.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
//
// TODO enable cuda once buffer issue investigated and fixed
// UNSUPPORTED: cuda
//
// Device to host copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_EXPLICIT

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
//
// TODO enable cuda once buffer issue investigated and fixed
// UNSUPPORTED: cuda
//
// Device to host copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_EXPLICIT

Expand Down
8 changes: 8 additions & 0 deletions sycl/test-e2e/Graph/Explicit/cycle_error.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,14 @@ void CreateGraphWithCyclesTest(bool DisableCycleChecks) {
}

int main() {
{
queue Queue;

if (!are_graphs_supported(Queue)) {
return 0;
}
}

// Test with cycle checks
CreateGraphWithCyclesTest(false);
// Test without cycle checks
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/executable_graph_update_ordering"
#include "../Inputs/executable_graph_update_ordering.cpp"
3 changes: 3 additions & 0 deletions sycl/test-e2e/Graph/Explicit/usm_copy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
// RUN: %if ext_oneapi_level_zero %{env UR_L0_LEAKS_DEBUG=1 %{run} %t.out 2>&1 | FileCheck %s %}
//
// CHECK-NOT: LEAK
//
// USM copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_EXPLICIT

Expand Down
1 change: 0 additions & 1 deletion sycl/test-e2e/Graph/Inputs/buffer_ordering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
#include "../graph_common.hpp"

int main() {

queue Queue{{sycl::ext::intel::property::queue::no_immediate_command_list{}}};

if (!are_graphs_supported(Queue)) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
//
// TODO enable cuda once buffer issue investigated and fixed
// UNSUPPORTED: cuda
//
// Host to device copy command not supported for OpenCL
// UNSUPPORTED: opencl

#define GRAPH_E2E_RECORD_REPLAY

Expand Down
Loading