-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] Add spec wording for dynamic events #372
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design looks good, mostly nitpicks - an example in "Examples" I think would be worthwhile too.
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for review by stakeholders
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
@gmlueck I've drafted this PR based on your proposal if you're able to take a look at it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a few comments.
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
|
||
* Passing SYCL events will create runtime dependencies for execution of the | ||
graph node but will only create edges if those events are associated with other | ||
nodes in the same graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a littler clearer:
- Passing SYCL events. If an event represents a recorded node in the same graph, then a graph edge is created between this node and the other node. Otherwise, a runtime dependency is created between this node and the command that is associated with the event.
What happens if the event represents a node recorded into a different graph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I think that is clearer.
What happens if the event represents a node recorded into a different graph?
In that case still only a runtime dependency is created. For easier reasoning about the structure of the graph I think it's important not to call these edges, otherwise this dependency would essentially make them into a single graph.
Edit: Actually perhaps your question is more about what if the event passed is from the recorded queue submission, rather than obtained via get_event
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: Actually perhaps your question is more about what if the event passed is from the recorded queue submission, rather than obtained via
get_event
?
Yes, that was my question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think right now this would have to still be an error and we should advise to use get_event()
and dynamic_event
s to establish these dependencies. This is more explicit which I think is good.
It might be possible to support this in the runtime but there is a very high implementation burden there I think. Neither the graph runtime or UR command-buffers are designed to accomodate that sort of sharing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this behavior, but the wording should be adjusted to make it clear that this is an error. Also, what is the behavior if this is violated? Does node::add
throw an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I've clarified the wording here. Yes node::add
should throw an exception, but these were not defined so I've added them as well.
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
// next execution | ||
dynEvent.update(execGraphA.get_event(nodeA)); | ||
// Update execGraphB with the affected node to reflect those changes | ||
execGraphB.update(nodeB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How optimized the graph update is? I don't have latency numbers at hand, but in CUDA, we try to avoid updating the graph when not absolutely needed, so I am worried that having to update it every step might negate most of the performance benefits of more streamlined submission.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have hard numbers, and graph update is a fairly new feature for us so there is likely room for improvement performance wise but I imagine there will always be some amount of overhead compared to not doing an update. I can see how that is a concern for your use case though.
One of the primary drivers for this design is a different use-case where the user wants the ability to wait for an event in the middle of the graph to be able to do some separate operations before the graph has fully finished executing. In that scenario it is important for us to be clear about what exactly each sycl::event
represents, So it made sense to be more explicit about each event representing a single execution, which does require a graph update as you say.
We had previously looked at designs where events would be reset automatically but it is difficult to reason about that as a user for this primary use-case. I do think that makes more sense for your stated use-case however, since the exact semantics of how these events are handled doesn't matter to the user as I see it, but it does present a challenge from an API design point of view in SYCL trying to create something that makes sense for both use cases and the wider SYCL specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how that is a concern for your use case though.
I would argue that it's a concern for many use-cases. Graphs are primarily used for latency-sensitive workloads, right? And here we mandate that if we want to use dynamic events as a dependency, we must update the graph every time it's submitted.
One of the primary drivers for this design is a different use-case where the user wants the ability to wait for an event in the middle of the graph to be able to do some separate operations before the graph has fully finished executing. In that scenario it is important for us to be clear about what exactly each sycl::event represents, So it made sense to be more explicit about each event representing a single execution
I do like the design in this PR more than the auto-resetting events / semaphore approach. Being explicit about which instance of the graph we are referring to is quite helpful in reasoning about the code (even if it is a bit more verbose). The question is, can it be efficient.
In particular, perhaps there is a way to leverage the fact that native CUDA events are auto-resetting, and make dynamic_event
somehow smart enough to make the graph update a no-op when the same CUDA event can be reused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, perhaps there is a way to leverage the fact that native CUDA events are auto-resetting, and make dynamic_event somehow smart enough to make the graph update a no-op when the same CUDA event can be reused?
I think it is probably possible, but as I mentioned in another comment I think there's quite a high implementation burden for that. Since we both have to map to multiple, quite different APIs on the UR level, and both the graph runtime and UR implementations are very much not set up to support that kind of information sharing between graphs/command-buffers.
I think because of that burden if we were going to do that it would need to be done as a separate piece of work, so I would be in favour of leaving this design as is for now. It should be easy to update the spec to a new behaviour in the future, if we can commit to the implementation in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graphs are primarily used for latency-sensitive workloads, right? And here we mandate that if we want to use dynamic events as a dependency, we must update the graph every time it's submitted.
my comment from user perspective, we always like the static graph without change, but if we are unable to keep the graph always the same (due to use case), we would have to use the dynamic (parameters or events) to update the graph, such overhead is likely to be less than recording/creating the graph again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that it's a concern for many use-cases. Graphs are primarily used for latency-sensitive workloads, right? And here we mandate that if we want to use dynamic events as a dependency, we must update the graph every time it's submitted.
I was referring to your use-case in contrast to the use-case example that I gave which may not use these kind of dependencies between graphs, and thus would not need to use update for their use-case, rather than trying to imply that your use-case is the only one with such concerns.
I agree that this is a concern for most and is the primary driver behind using graphs. Ideally having to update, especially a small number of nodes/events, will not outweigh the gains in reduced host latency from using graphs so there would still be a net performance increase. The effectiveness of graphs as a whole is workload dependent though so this may not be true for all cases.
@Bensuo In order to use dynamic_event for network dependencies, we need to be able to get a 'ze_event_handle_t'. Are dynamic_events translatable using sycl::get_native? |
As of right now there is no explicit conversion from a sycl::event SomeEvent = ExecGraph.get_event(SomeNode);
DynamicEvent.update(SomeEvent);
ze_event_handle_t NativeEvent = sycl::get_native(SomeEvent);
// Do something with NativeEvent |
Continuing discussion from here around barriers with dynamic events: intel#13253 (comment) I think it probably makes sense to extend |
Yes, that could make sense. Another option is to define an API that takes a parameter pack, where each parameter can be either |
Perfectly ok for us; we anyway submit barriers with only one event. But even if we expand, there would be no problem submitting two barriers for dynamic and "normal" events. |
I've made this change and ended up using both approaches: The |
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
graph edges between those nodes and the node being added. | ||
|
||
* Passing SYCL events, including <<dynamic-event, Dynamic Events>>. If an event | ||
represents a recorded node in the same graph, then a graph edge is created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
represents a recorded node in the same graph, then a graph edge is created | |
represents a node in the same graph, then a graph edge is created |
"recorded" may give a wrong impression that only for the recorded graph, not the graph created with explicit APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There currently isn't a way to get an event like this for nodes added explicitly, though you can go the other way and get a node
for a given event from a recorded queue submission so this is probably a gap that should be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't command_graph::get_event
provide this ability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't command_graph::get_event provide this ability?
Since get_event()
can only be called on executable graphs and returns an event specifically representing an execution point in a single graph execution (and can be used largely as normal SYCL events), they are conceptually quite different from the events returned from recorded queue submissions.
Those have no backend events associated with them, have many restrictions on where they can be used and are only for use in creating dependencies between graph nodes.
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
event get_event(const node& node); | ||
---- | ||
|Returns a SYCL event which represents the completion of node `node` which is | ||
valid only for the next execution of the graph. This event can be used as a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why the event is valid only for the next
graph execution. In graph.finalize(), the event is tied with a native event, and I think (at lease for most cases), the event is fixed after graph.finalize().
Or the actual purpose of this function is to get a chance to reset the fixed event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SYCL events don't have any concept of being reset, and if we introduced that it becomes unclear when events are reset if it happens automatically every time you submit a graph for execution.
This way it is explicit what an event represents. It is not necessarily true that the backend event will be reset by this function call, but that is the general idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not necessarily true that the backend event will be reset by this function call, but that is the general idea.
Does this mean that "the backend event will be reset by this function call" is the general idea in practice, but it is not a must? (From spec perspective, it does not define this function's implementation.)
Suppose one implementation chooses to reset the backend event for this function, and the event is not reused by other places, based on it, just a "thought experiment", the code below:
dynEvent.update(execGraphA.get_event(nodeA));
execGraphB.update(nodeB);
can be simplified with below code:
execGraphA.get_event(nodeA);
Is it correct for this special implementation? The reason is that the backend event used by execGraphA does not change (just the event status is changed), and so the two "update" functions can be removed for a certain implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my previous reply was poorly worded, I did not mean to imply that that is how the implementation will behave, and I don't think an implementation would or should allow your simplified code.
This is basically the same idea being discussed here if I'm not mistaken, and I would reiterate that is something we can explore in the future but not as part of this initial work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the expected/typical implementation of this function to make the returned event only valid for the next execution of the graph, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I don't really have an answer for that immediately, but I think you are right that this needs some further consideration so we are looking into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering this more, I'm concerned that get_event(node)
may not be implementable on a CUDA backend. Since the executable graph has already been finalized then we can't add nodes to it, but to implement this ontop of CUDA-Graph I think we need to use cudaGraphAddEventRecordNode()
- which would require some mechanism in the modifiable graph to say that a node is going to produce a sycl event.
EDIT: I've moved this comment to its own thread here #372 (comment) to avoid branching the discussion from performant usage on backend that can reset events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if defining the semantics of dynamic_event like a counter-based event would help give a more optimized flow for the node-to-node only use case.
-
If a
dynamic_event
is created from a vanilla sycl queue submission. Counter only incremented once when sycl event backing it is complete. -
If a
dynamic_event
is created for a node in a graph, then counter incremented on completion of each execution of the graph. -
When waiting on a
dynamic_event
in a vanilla queue submission, waiting on value to increment from current value.
A user wouldn't need to manually call any reset methods or the update() calls currently needed for the node-to-node between graph case.
Edit: I've actually realized that to implement get_event(node)
we may not need to actually update the signal event in a L0 command, but always use the underlying L0 event that already exist for sync-points (or create one if need be) wrapped in different ur events events.
This L0 event gets reset between executions anyway, So we may actually get this kind of counter based
dynamic_event semantics by default without having to go out our way to achieve it.
I have an open for a possible new user case for the interaction between two graphs as below, the key is that workload_b() is executed on another hardware not managed by sycl. We'll use in order sycl queue and recording&replay APIs.
In workload_b(), it can check if q is in recording status, if yes, it can start its own record for the workloads in workload_b, and it also has ability to understand |
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
- Rename limited graph events to graph-limited events - Clarify language of external dependencies in graphs - Remove usage of "internal edges"
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
These events represent only the most recent execution of a given executable | ||
graph. If an application executes the same graph multiple times before | ||
scheduling work or performing a host-side wait on the event then executions of | ||
the node in a previous execution other than the most recent one may be missed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the long sentence, does this talk about the code like below?
sycl::event ExecutionEvent = ExecGraph.get_event(SomeNode);
for (int i = 0; i < 1000; ++i) // 1000 is just one value for example
Queue.ext_oneapi_graph(ExecGraph);
Queue.submit((sycl::handler& CGH)
{
CGH.depends_on(ExecutionEvent);
CGH.parallel_for(...);
});
what is "Queue.ext_oneapi_graph(ExecGraph); with i==998" missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After having some internal discussions with regards to implementability of this aspect of the spec I've pushed some wording to clarify what these events represent, but I'll also address your specific example.
As it stands right now if multiple SYCL graph submissions are in-flight as in your example, it is undefined at the point where you call CGH.depends_on(ExecutionEvent);
which exact execution it will depend on, as it depends which iteration is currently executing on device.
To depend on a specific execution the application must enforce ordering such that multiple enqueues are not in-flight, like so (example I just added taken from the spec):
sycl::event NodeExecutionEvent;
sycl::event GraphCompletionEvent;
for (size_t i = 0; i < GraphIterations; i++){
// Obtain the node execution event for the graph
NodeExecutionEvent = ExecGraph.get_event(SomeNode);
// Enqueue the graph for execution
sycl::event GraphCompletionEvent = Queue.ext_oneapi_graph(ExecGraph);
// Use the event, to wait on the host and perform some intermediate
// host-work once that node has completed
ExecutionEvent.wait();
DoSomethingOnHost();
// Wait on the graph finishing to ensure it is complete before the next use of
// a node-level execution event
GraphCompletionEvent.wait_and_throw();
}
This is generally in keeping with how the backends handle this but slightly more restrictive because SYCL events do not function like CUDA/L0 events and the user does not have actual visibility of device execution because this is obfuscated by the SYCL runtime.
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
|
||
==== Adding External Event Dependencies To Graphs [[external-event-dependencies]] | ||
|
||
<<event-terminology, Regular SYCL events>> can be passed as dependencies to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regular SYCL events include:
- normal submissions to a SYCL queue
- events returned from submitting an executable command_graph for execution
- events obtained via command_graph<graph_state::executable>::get_event()
as for item 3, These events can then be waited on or used as dependencies for eager SYCL operations **outside** of graphs
, so can item 3 be used as dependencies to graph nodes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is explicitly disallowed in the spec, dependencies between graph nodes should be established using graph-limited events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the answer is no, I think we can not just use "Regular SYCL events" here since not all the three parts are all allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, I've updated the wording here to refer to events from eager SYCL submissions only.
- Clarify what they represent with example
Expand the command-buffer experimental feature API so that it can be used to implement [SYCL-Graph dynamic events](reble/llvm#372). This involves extending each command append entry-point to include the following extra parameters: * An output `ur_exp_command_buffer_command_handle_t`. * An Input `ur_event_handle_t` event wait-list of dependent events. * An output `ur_event_handle_t` event that is signaled when the command completes its next execution. New entry-points are also added to update the wait-list and signal event parameters of commands: * `urCommandBufferUpdateSignalEventExp` * `urCommandBufferUpdateWaitEventsExp` APIs implemented for CUDA adapter with CTS tests.
- Inter-graph dependencies must have source graph finalized before creation - Graphs with inter-graph dependencies can only be finalized once - Update usage guide example on inter-graph dependencies - Various updates to improve error coverage of method definitions.
Expand the command-buffer experimental feature API so that it can be used to implement [SYCL-Graph dynamic events](reble/llvm#372). This involves extending each command append entry-point to include the following extra parameters: * An output `ur_exp_command_buffer_command_handle_t`. * An Input `ur_event_handle_t` event wait-list of dependent events. * An output `ur_event_handle_t` event that is signaled when the command completes its next execution. New entry-points are also added to update the wait-list and signal event parameters of commands: * `urCommandBufferUpdateSignalEventExp` * `urCommandBufferUpdateWaitEventsExp` APIs implemented for CUDA adapter with CTS tests.
Expand the command-buffer experimental feature API so that it can be used to implement [SYCL-Graph dynamic events](reble/llvm#372). This involves extending each command append entry-point to include the following extra parameters: * An output `ur_exp_command_buffer_command_handle_t`. * An Input `ur_event_handle_t` event wait-list of dependent events. * An output `ur_event_handle_t` event that is signaled when the command completes its next execution. New entry-points are also added to update the wait-list and signal event parameters of commands: * `urCommandBufferUpdateSignalEventExp` * `urCommandBufferUpdateWaitEventsExp` APIs implemented for CUDA adapter with CTS tests.
Closing this in favor of the upstream PR. Further discussion can happen there: intel#15056 |
Expand the command-buffer experimental feature API so that it can be used to implement [SYCL-Graph dynamic events](reble/llvm#372). This involves extending each command append entry-point to include the following extra parameters: * An output `ur_exp_command_buffer_command_handle_t`. * An Input `ur_event_handle_t` event wait-list of dependent events. * An output `ur_event_handle_t` event that is signaled when the command completes its next execution. New entry-points are also added to update the wait-list and signal event parameters of commands: * `urCommandBufferUpdateSignalEventExp` * `urCommandBufferUpdateWaitEventsExp` APIs implemented for CUDA adapter with CTS tests.
Expand the command-buffer experimental feature API so that it can be used to implement [SYCL-Graph dynamic events](reble/llvm#372). This involves extending each command append entry-point to include the following extra parameters: * An output `ur_exp_command_buffer_command_handle_t`. * An Input `ur_event_handle_t` event wait-list of dependent events. * An output `ur_event_handle_t` event that is signaled when the command completes its next execution. New entry-points are also added to update the wait-list and signal event parameters of commands: * `urCommandBufferUpdateSignalEventExp` * `urCommandBufferUpdateWaitEventsExp` APIs implemented for CUDA adapter with CTS tests.
Expand the command-buffer experimental feature API so that it can be used to implement [SYCL-Graph dynamic events](reble/llvm#372). This involves extending each command append entry-point to include the following extra parameters: * An output `ur_exp_command_buffer_command_handle_t`. * An Input `ur_event_handle_t` event wait-list of dependent events. * An output `ur_event_handle_t` event that is signaled when the command completes its next execution. New entry-points are also added to update the wait-list and signal event parameters of commands: * `urCommandBufferUpdateSignalEventExp` * `urCommandBufferUpdateWaitEventsExp` APIs implemented for CUDA adapter with CTS tests.
Expand the command-buffer experimental feature API so that it can be used to implement [SYCL-Graph dynamic events](reble/llvm#372). This involves extending each command append entry-point to include the following extra parameters: * An output `ur_exp_command_buffer_command_handle_t`. * An Input `ur_event_handle_t` event wait-list of dependent events. * An output `ur_event_handle_t` event that is signaled when the command completes its next execution. New entry-points are also added to update the wait-list and signal event parameters of commands: * `urCommandBufferUpdateSignalEventExp` * `urCommandBufferUpdateWaitEventsExp` APIs implemented for CUDA adapter with CTS tests.
Expand the command-buffer experimental feature API so that it can be used to implement [SYCL-Graph dynamic events](reble/llvm#372). This involves extending each command append entry-point to include the following extra parameters: * An output `ur_exp_command_buffer_command_handle_t`. * An Input `ur_event_handle_t` event wait-list of dependent events. * An output `ur_event_handle_t` event that is signaled when the command completes its next execution. New entry-points are also added to update the wait-list and signal event parameters of commands: * `urCommandBufferUpdateSignalEventExp` * `urCommandBufferUpdateWaitEventsExp` APIs implemented for CUDA adapter with CTS tests.
Expand the command-buffer experimental feature API so that it can be used to implement [SYCL-Graph dynamic events](reble/llvm#372). This involves extending each command append entry-point to include the following extra parameters: * An output `ur_exp_command_buffer_command_handle_t`. * An Input `ur_event_handle_t` event wait-list of dependent events. * An output `ur_event_handle_t` event that is signaled when the command completes its next execution. New entry-points are also added to update the wait-list and signal event parameters of commands: * `urCommandBufferUpdateSignalEventExp` * `urCommandBufferUpdateWaitEventsExp` APIs implemented for CUDA adapter with CTS tests.
Expand the command-buffer experimental feature API so that it can be used to implement [SYCL-Graph dynamic events](reble/llvm#372). This involves extending each command append entry-point to include the following extra parameters: * An output `ur_exp_command_buffer_command_handle_t`. * An Input `ur_event_handle_t` event wait-list of dependent events. * An output `ur_event_handle_t` event that is signaled when the command completes its next execution. New entry-points are also added to update the wait-list and signal event parameters of commands: * `urCommandBufferUpdateSignalEventExp` * `urCommandBufferUpdateWaitEventsExp` APIs implemented for CUDA adapter with CTS tests.
Expand the command-buffer experimental feature API so that it can be used to implement [SYCL-Graph dynamic events](reble/llvm#372). This involves extending each command append entry-point to include the following extra parameters: * An output `ur_exp_command_buffer_command_handle_t`. * An Input `ur_event_handle_t` event wait-list of dependent events. * An output `ur_event_handle_t` event that is signaled when the command completes its next execution. New entry-points are also added to update the wait-list and signal event parameters of commands: * `urCommandBufferUpdateSignalEventExp` * `urCommandBufferUpdateWaitEventsExp` APIs implemented for CUDA adapter with CTS tests.
Expand the command-buffer experimental feature API so that it can be used to implement [SYCL-Graph dynamic events](reble/llvm#372). This involves extending each command append entry-point to include the following extra parameters: * An output `ur_exp_command_buffer_command_handle_t`. * An Input `ur_event_handle_t` event wait-list of dependent events. * An output `ur_event_handle_t` event that is signaled when the command completes its next execution. New entry-points are also added to update the wait-list and signal event parameters of commands: * `urCommandBufferUpdateSignalEventExp` * `urCommandBufferUpdateWaitEventsExp` APIs implemented for CUDA adapter with CTS tests.
ext_oneapi_barrier()
to allow passing dynamic events.