Skip to content

Commit

Permalink
[SYCL][Graph] Blocking wait in finalize on scheduler dependencies
Browse files Browse the repository at this point in the history
When command-groups uses accessors in SYCL-Graph nodes, our implementation
uses the scheduler to wrangle dependencies and then add the node to the graph.
This happens at the application level during graph finalization. We currently
have a call to `waitForEvents()` when adding nodes, that waits on the device
event dependencies of that node command-group. For example, a memory copy command.

However, this corresponds to a call to `piEnqueueWaitForEvents`(), which is an
asynchronous call that itself returns an event that must be waited on later. The
blocking wait on this returned event is invoked by the scheduler on enqueue of
the executable graph.

This patch changes this behaviour to perform blocking waits earlier, on node
addition to the PI/UR command-buffer during graph finalization. We have designed
`finalize()` as the computationally expensive entry-point in the extension spec,
so blocking behaviour makes sense here rather than on executable graph submission.

There are two categories of blocking wait done when the scheduler adds a
node to the graph:

1. Blocking wait on host event dependencies, these correspond to scheduler
commands that don't have an associated PiEnqueue call. For example, memory
allocation commands in the scheduler. Adding this wait fixes the E2E failures
on DG2 devices, so the tests are re-enabled.

2. Blocking wait on device event dependencies, these correspond to scheduler
commands that do have an associated PiEnqueue call that returns an event. These
can occur when regular queue submissions are interleaved with adding graph nodes.
Introducing this wait fixes fails in the `buffer_ordering.cpp` E2E test on some
Level Zero devices.
  • Loading branch information
EwanC committed Oct 30, 2023
1 parent 6ba7b52 commit 6f5064d
Show file tree
Hide file tree
Showing 30 changed files with 29 additions and 88 deletions.
18 changes: 18 additions & 0 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,24 @@ the scheduler for adding to the UR command-buffer, otherwise the node can
be appended directly as a command in the UR command-buffer. This is in-keeping
with the existing behaviour of the handler with normal queue submissions.

Scheduler commands for adding graph nodes differ from typical command-group
submission in the scheduler, in that they do not launch any asynchronous work
which relies on their dependencies, and are considered complete immediately
after adding the command-group node to the graph.

This presents problems with device allocations which create both an allocation
command and a separate initial copy command of data to the new allocation.
Since future command-graph execution submissions will only receive
dependencies on the allocation command (since this is all the information
available), this could lead to situations where the device execution of the
initial copy command is delayed due to device occupancy, and the command-graph
and initial copy could execute on the device in an incorrect order.

To solve this issue, when the scheduler enqueues command-groups to add as nodes
in a command-graph, it will perform a blocking wait on the dependencies of the
command-group first. The user will experience this wait as part of graph
finalization.

## Memory handling: Buffer and Accessor

There is no extra support for graph-specific USM allocations in the current
Expand Down
15 changes: 10 additions & 5 deletions sycl/source/detail/scheduler/commands.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2657,14 +2657,19 @@ enqueueReadWriteHostPipe(const QueueImplPtr &Queue, const std::string &PipeName,
}

pi_int32 ExecCGCommand::enqueueImpCommandBuffer() {
std::vector<EventImplPtr> EventImpls = MPreparedDepsEvents;
auto RawEvents = getPiEvents(EventImpls);
flushCrossQueueDeps(EventImpls, getWorkerQueue());
// Wait on host command dependencies
waitForPreparedHostEvents();

// Any non-allocation dependencies need to be waited on here since subsequent
// Any device dependencies need to be waited on here since subsequent
// submissions of the command buffer itself will not receive dependencies on
// them, e.g. initial copies from host to device
waitForEvents(MQueue, MPreparedDepsEvents, MEvent->getHandleRef());
std::vector<EventImplPtr> EventImpls = MPreparedDepsEvents;
flushCrossQueueDeps(EventImpls, getWorkerQueue());
std::vector<sycl::detail::pi::PiEvent> RawEvents = getPiEvents(EventImpls);
if (!RawEvents.empty()) {
const PluginPtr &Plugin = MQueue->getPlugin();
Plugin->call<PiApiKind::piEventsWait>(RawEvents.size(), &RawEvents[0]);
}

sycl::detail::pi::PiEvent *Event =
(MQueue->has_discard_events_support() &&
Expand Down
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/Explicit/basic_buffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/basic_buffer.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/Explicit/buffer_copy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// XFAIL: gpu-intel-dg2

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/buffer_copy.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/Explicit/buffer_copy_2d.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// XFAIL: gpu-intel-dg2

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/buffer_copy_2d.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/Explicit/buffer_copy_host2target.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/buffer_copy_host2target.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/Explicit/buffer_copy_host2target_2d.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/buffer_copy_host2target_2d.cpp"
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/buffer_copy_host2target_offset.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/Explicit/buffer_copy_offsets.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// XFAIL: gpu-intel-dg2

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/buffer_copy_offsets.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/Explicit/buffer_copy_target2host.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

// REQUIRES: level_zero, gpu
// RUN: %{build} -o %t.out
// RUN: %{run} %t.out
Expand Down
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/Explicit/buffer_copy_target2host_2d.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

// REQUIRES: level_zero, gpu
// RUN: %{build} -o %t.out
// RUN: %{run} %t.out
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

// REQUIRES: level_zero, gpu
// RUN: %{build} -o %t.out
// RUN: %{run} %t.out
Expand Down
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/Explicit/event_status_querying.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,6 @@
//
// CHECK: complete

// https://github.com/intel/llvm/issues/11434
// XFAIL: gpu-intel-dg2

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/event_status_querying.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/Explicit/temp_buffer_reinterpret.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/temp_buffer_reinterpret.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/Explicit/usm_copy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/usm_copy.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/RecordReplay/basic_buffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

#define GRAPH_E2E_RECORD_REPLAY

#include "../Inputs/basic_buffer.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/RecordReplay/buffer_copy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// XFAIL: gpu-intel-dg2

#define GRAPH_E2E_RECORD_REPLAY

#include "../Inputs/buffer_copy.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/RecordReplay/buffer_copy_2d.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// XFAIL: gpu-intel-dg2

#define GRAPH_E2E_RECORD_REPLAY

#include "../Inputs/buffer_copy_2d.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/RecordReplay/buffer_copy_host2target.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

#define GRAPH_E2E_RECORD_REPLAY

#include "../Inputs/buffer_copy_host2target.cpp"
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

#define GRAPH_E2E_RECORD_REPLAY

#include "../Inputs/buffer_copy_host2target_2d.cpp"
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

#define GRAPH_E2E_RECORD_REPLAY

#include "../Inputs/buffer_copy_host2target_offset.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/RecordReplay/buffer_copy_offsets.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// XFAIL: gpu-intel-dg2

#define GRAPH_E2E_RECORD_REPLAY

#include "../Inputs/buffer_copy_offsets.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/RecordReplay/buffer_copy_target2host.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

// REQUIRES: level_zero, gpu
// RUN: %{build} -o %t.out
// RUN: %{run} %t.out
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

// REQUIRES: level_zero, gpu
// RUN: %{build} -o %t.out
// RUN: %{run} %t.out
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

// REQUIRES: level_zero, gpu
// RUN: %{build} -o %t.out
// RUN: %{run} %t.out
Expand Down
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/RecordReplay/event_status_querying.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,6 @@
//
// CHECK: complete

// https://github.com/intel/llvm/issues/11434
// XFAIL: gpu-intel-dg2

#define GRAPH_E2E_RECORD_REPLAY

#include "../Inputs/event_status_querying.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/RecordReplay/temp_buffer_reinterpret.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

#define GRAPH_E2E_RECORD_REPLAY

#include "../Inputs/temp_buffer_reinterpret.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/RecordReplay/usm_copy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

#define GRAPH_E2E_RECORD_REPLAY

#include "../Inputs/usm_copy.cpp"
3 changes: 0 additions & 3 deletions sycl/test-e2e/Graph/RecordReplay/usm_copy_in_order.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@
//
// CHECK-NOT: LEAK

// https://github.com/intel/llvm/issues/11434
// UNSUPPORTED: gpu-intel-dg2

// Tests memcpy operation using device USM and an in-order queue.

#include "../graph_common.hpp"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
// RUN: %{build} -o %t.out
// RUN: %{run} %t.out
//
// UNSUPPORTED: gpu-intel-pvc
// The test checks that invalid exception is thrown
// when trying to use sycl_ext_oneapi_device_global
// along with Graph.
Expand All @@ -19,7 +18,7 @@ sycl::ext::oneapi::experimental::device_global<int, TestProperties>
enum OperationPath { Explicit, RecordReplay, Shortcut };

template <OperationPath PathKind> void test() {
queue Q;
queue Q{{sycl::ext::intel::property::queue::no_immediate_command_list{}}};
int MemcpyWrite = 42, CopyWrite = 24, MemcpyRead = 1, CopyRead = 2;

exp_ext::command_graph Graph{Q.get_context(), Q.get_device()};
Expand Down

0 comments on commit 6f5064d

Please sign in to comment.