Skip to content

Commit

Permalink
[SYCL][Graph] Automatic dependency managment for multiple command-buf…
Browse files Browse the repository at this point in the history
…fer submissions

Synchronizations between multiple submission of a graph are managed by the runtime.
The dependency to previous submission is automatically added when a graph is resubmitted.
Updates spec and tests accordingly.
  • Loading branch information
mfrancepillois committed Dec 4, 2023
1 parent 88f1d0a commit cc87384
Show file tree
Hide file tree
Showing 24 changed files with 53 additions and 196 deletions.
7 changes: 7 additions & 0 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,13 @@ Level Zero:
`waitForEvents` on the same command-list. Resulting in additional latency when
executing a UR command-buffer.

3. Dependencies between multiple submissions must be handled by the runtime.
Indeed, when a second submission is performed the signal conditions
of *WaitEvent* are redefined by this second submission.
Therefore, this can lead to an undefined behavior and potential
hangs especially if the conditions of the first submissions were not yet
satisfied and the event has not yet been signaled.

Future work will include exploring L0 API extensions to improve the mapping of
UR command-buffer to L0 command-list.

Expand Down
27 changes: 13 additions & 14 deletions sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -1059,9 +1059,10 @@ void
handler::ext_oneapi_graph(command_graph<graph_state::executable>& graph)
----

|Invokes the execution of a graph. Only one instance of `graph` may be executing,
or pending execution, at any time. Concurrent graph execution can be achieved by
finalizing a graph in modifiable state into multiple graphs in executable state.
|Invokes the execution of a graph. Only one instance of `graph` will
execute at any time. If `graph` is submitted multiple times, dependencies
are automatically added by the runtime to prevent concurrent executions of
an identical graph.

Parameters:

Expand All @@ -1073,8 +1074,6 @@ Exceptions:
to a queue which is associated with a device or context that is different
from the device and context used on creation of the graph.

* Throws synchronously with error code `invalid` if a previous submission of
`graph` has yet to complete execution.
|===

=== Thread Safety
Expand Down Expand Up @@ -1600,6 +1599,12 @@ outputs of the modifiable graph, a technique called _Whole Graph Update_. The
modifiable graph must have the same topology as the graph originally used to
create the executable graphs, with the nodes targeting the same devices and
added in the same order.
If a graph has been updated since its last submission, the sequential
execution constraint is no longer required.
The automatic addition of dependencies is disabled and updated graphs
can be submitted simultaneously.
Users are therefore responsible for explicitly managing potential dependencies
between these executions to avoid data races.

:sycl-kernel-function: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sycl-kernel-function

Expand Down Expand Up @@ -1729,15 +1734,6 @@ runtime.

== Issues

=== Simultaneous Graph Submission

Enable an instance of a graph in executable state to be submitted for execution
when a previous submission of the same graph has yet to complete execution.

**UNRESOLVED:** Trending "yes". Backend support for this is inconsistent, but
the runtime could schedule the submissions sequentially for backends which don't
support it.

=== Multi Device Graph

Allow an executable graph to contain nodes targeting different devices.
Expand Down Expand Up @@ -1802,6 +1798,9 @@ if used in application code.
`sycl::ext::intel::property::queue::no_immediate_command_list`
should be set on construction to any queues an executable
graph is submitted to.
. Synchronization between multiple executions of the same command-buffer
must be handled in the host for level-zero backend, which may involve
extra latency for subsequent submissions.

== Revision History

Expand Down
37 changes: 33 additions & 4 deletions sycl/source/detail/graph_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -596,11 +596,40 @@ exec_graph_impl::enqueue(const std::shared_ptr<sycl::detail::queue_impl> &Queue,
sycl::detail::EventImplPtr NewEvent;

if (CommandBuffer) {
if (!previousSubmissionCompleted()) {
throw sycl::exception(make_error_code(errc::invalid),
"This Graph cannot be submitted at the moment "
"because the previous run has not yet completed.");
// if previous submissions are incompleted, we automatically
// add completion events of previous submissions as dependencies.
// With Level-Zero backend we cannot resubmit a command-buffer until the
// previous one has already completed.
// Indeed, since a command-list does not accept a list a dependencies at
// submission, we circumvent this lack by adding a barrier that waits on a
// specific event and then define the conditions to signal this event in
// another command-list. Consequently, if a second submission is performed,
// the signal conditions of this single event are redefined by this second
// submission. Thus, this can lead to an undefined behaviour and potential
// hangs. We have therefore to expliclty wait in the host for
// previous submission to complete before resubmitting the command-buffer
// for level-zero backend.
// TODO : add a check to release this constraint and allow multiple
// concurrent submissions if the exec_graph has been updated since the last
// submission.
for (std::vector<sycl::detail::EventImplPtr>::iterator It =
MExecutionEvents.begin();
It != MExecutionEvents.end();) {
auto Event = *It;
if (!Event->isCompleted()) {
if (Queue->get_device().get_backend() ==
sycl::backend::ext_oneapi_level_zero) {
Event->wait(Event);
} else {
CGData.MEvents.push_back(Event);
}
++It;
} else {
// Remove completed events
It = MExecutionEvents.erase(It);
}
}

NewEvent = CreateNewEvent();
sycl::detail::pi::PiEvent *OutEvent = &NewEvent->getHandleRef();
// Merge requirements from the nodes into requirements (if any) from the
Expand Down
14 changes: 0 additions & 14 deletions sycl/source/detail/graph_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -917,20 +917,6 @@ class exec_graph_impl {
/// @return pointer to the graph_impl MGraphImpl
const std::shared_ptr<graph_impl> &getGraphImpl() const { return MGraphImpl; }

/// Checks if the previous submissions of this graph have been completed
/// This function checks the status of events associated to the previous graph
/// submissions.
/// @return true if all previous submissions have been completed, false
/// otherwise.
bool previousSubmissionCompleted() const {
for (auto Event : MExecutionEvents) {
if (!Event->isCompleted()) {
return false;
}
}
return true;
}

private:
/// Create a command-group for the node and add it to command-buffer by going
/// through the scheduler.
Expand Down
2 changes: 0 additions & 2 deletions sycl/test-e2e/Graph/Explicit/enqueue_ordering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,6 @@ int main() {
});
});

E4.wait();

// Buffer elements set to 22
Queue.submit([&](handler &CGH) {
CGH.depends_on(E5);
Expand Down
4 changes: 0 additions & 4 deletions sycl/test-e2e/Graph/Inputs/add_nodes_after_finalize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -64,18 +64,14 @@ int main() {
event Event;
for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
Event.wait();
}

for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExecAdditional);
});
Event.wait();
}

Queue.wait_and_throw();
Expand Down
2 changes: 0 additions & 2 deletions sycl/test-e2e/Graph/Inputs/basic_buffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,8 @@ int main() {
event Event;
for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
Event.wait();
}
Queue.wait_and_throw();
}
Expand Down
2 changes: 0 additions & 2 deletions sycl/test-e2e/Graph/Inputs/basic_usm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,8 @@ int main() {
event Event;
for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
Event.wait();
}

Queue.wait_and_throw();
Expand Down
2 changes: 0 additions & 2 deletions sycl/test-e2e/Graph/Inputs/basic_usm_host.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,8 @@ int main() {
event Event;
for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
Event.wait();
}

Queue.wait_and_throw();
Expand Down
2 changes: 0 additions & 2 deletions sycl/test-e2e/Graph/Inputs/basic_usm_mixed.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,8 @@ int main() {
event Event;
for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
Event.wait();
}

Queue.wait_and_throw();
Expand Down
2 changes: 0 additions & 2 deletions sycl/test-e2e/Graph/Inputs/basic_usm_shared.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,8 @@ int main() {
event Event;
for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
Event.wait();
}

Queue.wait_and_throw();
Expand Down
1 change: 0 additions & 1 deletion sycl/test-e2e/Graph/Inputs/basic_usm_system.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ int main() {
event Event;
for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
}
Expand Down
2 changes: 0 additions & 2 deletions sycl/test-e2e/Graph/Inputs/buffer_copy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,8 @@ int main() {
event Event;
for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
Event.wait();
}
Queue.wait_and_throw();
}
Expand Down
2 changes: 0 additions & 2 deletions sycl/test-e2e/Graph/Inputs/buffer_copy_2d.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -106,10 +106,8 @@ int main() {
event Event;
for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
Event.wait();
}
Queue.wait_and_throw();
}
Expand Down
1 change: 0 additions & 1 deletion sycl/test-e2e/Graph/Inputs/buffer_ordering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,6 @@ int main() {
});
});

Event.wait();
// Buffer elements set to 22
Queue.submit([&](handler &CGH) { CGH.ext_oneapi_graph(ExecGraph); });

Expand Down
1 change: 0 additions & 1 deletion sycl/test-e2e/Graph/Inputs/host_task.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,6 @@ int main() {
event Event;
for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
}
Expand Down
1 change: 0 additions & 1 deletion sycl/test-e2e/Graph/Inputs/multiple_exec_graphs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ int main() {
for (unsigned n = 0; n < Iterations; n++) {
auto GraphExec = Graph.finalize();
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
}
Expand Down
2 changes: 0 additions & 2 deletions sycl/test-e2e/Graph/Inputs/queue_shortcuts.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,12 @@ int main() {

// Execute several iterations of the graph using the different shortcuts
event Event = Queue.ext_oneapi_graph(GraphExec);
Event.wait();

assert(Iterations > 2);
const size_t LoopIterations = Iterations - 2;
std::vector<event> Events(LoopIterations);
for (unsigned n = 0; n < LoopIterations; n++) {
Events[n] = Queue.ext_oneapi_graph(GraphExec, Event);
Events[n].wait();
}

Queue.ext_oneapi_graph(GraphExec, Events).wait();
Expand Down
2 changes: 0 additions & 2 deletions sycl/test-e2e/Graph/Inputs/sub_graph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -104,10 +104,8 @@ int main() {
event Event;
for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(MainGraphExec);
});
Event.wait();
}
Queue.wait_and_throw();

Expand Down
2 changes: 0 additions & 2 deletions sycl/test-e2e/Graph/Inputs/temp_buffer_reinterpret.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,8 @@ int main() {
event Event;
for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
Event.wait();
}

Queue.copy(BufferA.get_access(), DataA.data());
Expand Down
2 changes: 0 additions & 2 deletions sycl/test-e2e/Graph/Inputs/usm_copy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -104,10 +104,8 @@ int main() {
event Event;
for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
Event.wait();
}

Queue.copy(PtrA, DataA.data(), Size, Event);
Expand Down
2 changes: 0 additions & 2 deletions sycl/test-e2e/Graph/RecordReplay/after_use.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,8 @@ int main() {
// before graph recording)
for (unsigned n = 1; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
Event.wait();
}
Queue.wait_and_throw();

Expand Down
2 changes: 0 additions & 2 deletions sycl/test-e2e/Graph/RecordReplay/barrier_with_work.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -100,10 +100,8 @@ int main() {
event Event;
for (unsigned n = 0; n < Iterations; n++) {
Event = Queue.submit([&](handler &CGH) {
CGH.depends_on(Event);
CGH.ext_oneapi_graph(GraphExec);
});
Event.wait();
}
Queue.wait_and_throw();

Expand Down
Loading

0 comments on commit cc87384

Please sign in to comment.