Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Add spec wording for dynamic events #372

Closed
wants to merge 10 commits into from

Conversation

Bensuo
Copy link
Collaborator

@Bensuo Bensuo commented May 20, 2024

  • Adds dynamic events which are updatable
  • Removes limitations on events form outside a graph
  • depends_on prop can now take events and dynamic events
  • Added get_event to get event for node in graph
  • Extend ext_oneapi_barrier() to allow passing dynamic events.

@Bensuo Bensuo added the Graph Specification Extension Specification related label May 20, 2024
Copy link
Collaborator

@EwanC EwanC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design looks good, mostly nitpicks - an example in "Examples" I think would be worthwhile too.

Copy link
Collaborator

@EwanC EwanC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for review by stakeholders

@Bensuo
Copy link
Collaborator Author

Bensuo commented May 22, 2024

@gmlueck I've drafted this PR based on your proposal if you're able to take a look at it

Copy link

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few comments.


* Passing SYCL events will create runtime dependencies for execution of the
graph node but will only create edges if those events are associated with other
nodes in the same graph.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a littler clearer:

  • Passing SYCL events. If an event represents a recorded node in the same graph, then a graph edge is created between this node and the other node. Otherwise, a runtime dependency is created between this node and the command that is associated with the event.

What happens if the event represents a node recorded into a different graph?

Copy link
Collaborator Author

@Bensuo Bensuo May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I think that is clearer.

What happens if the event represents a node recorded into a different graph?

In that case still only a runtime dependency is created. For easier reasoning about the structure of the graph I think it's important not to call these edges, otherwise this dependency would essentially make them into a single graph.

Edit: Actually perhaps your question is more about what if the event passed is from the recorded queue submission, rather than obtained via get_event?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Actually perhaps your question is more about what if the event passed is from the recorded queue submission, rather than obtained via get_event?

Yes, that was my question.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think right now this would have to still be an error and we should advise to use get_event() and dynamic_events to establish these dependencies. This is more explicit which I think is good.

It might be possible to support this in the runtime but there is a very high implementation burden there I think. Neither the graph runtime or UR command-buffers are designed to accomodate that sort of sharing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this behavior, but the wording should be adjusted to make it clear that this is an error. Also, what is the behavior if this is violated? Does node::add throw an exception?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I've clarified the wording here. Yes node::add should throw an exception, but these were not defined so I've added them as well.

// next execution
dynEvent.update(execGraphA.get_event(nodeA));
// Update execGraphB with the affected node to reflect those changes
execGraphB.update(nodeB);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How optimized the graph update is? I don't have latency numbers at hand, but in CUDA, we try to avoid updating the graph when not absolutely needed, so I am worried that having to update it every step might negate most of the performance benefits of more streamlined submission.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have hard numbers, and graph update is a fairly new feature for us so there is likely room for improvement performance wise but I imagine there will always be some amount of overhead compared to not doing an update. I can see how that is a concern for your use case though.

One of the primary drivers for this design is a different use-case where the user wants the ability to wait for an event in the middle of the graph to be able to do some separate operations before the graph has fully finished executing. In that scenario it is important for us to be clear about what exactly each sycl::event represents, So it made sense to be more explicit about each event representing a single execution, which does require a graph update as you say.

We had previously looked at designs where events would be reset automatically but it is difficult to reason about that as a user for this primary use-case. I do think that makes more sense for your stated use-case however, since the exact semantics of how these events are handled doesn't matter to the user as I see it, but it does present a challenge from an API design point of view in SYCL trying to create something that makes sense for both use cases and the wider SYCL specification.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how that is a concern for your use case though.

I would argue that it's a concern for many use-cases. Graphs are primarily used for latency-sensitive workloads, right? And here we mandate that if we want to use dynamic events as a dependency, we must update the graph every time it's submitted.

One of the primary drivers for this design is a different use-case where the user wants the ability to wait for an event in the middle of the graph to be able to do some separate operations before the graph has fully finished executing. In that scenario it is important for us to be clear about what exactly each sycl::event represents, So it made sense to be more explicit about each event representing a single execution

I do like the design in this PR more than the auto-resetting events / semaphore approach. Being explicit about which instance of the graph we are referring to is quite helpful in reasoning about the code (even if it is a bit more verbose). The question is, can it be efficient.

In particular, perhaps there is a way to leverage the fact that native CUDA events are auto-resetting, and make dynamic_event somehow smart enough to make the graph update a no-op when the same CUDA event can be reused?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, perhaps there is a way to leverage the fact that native CUDA events are auto-resetting, and make dynamic_event somehow smart enough to make the graph update a no-op when the same CUDA event can be reused?

I think it is probably possible, but as I mentioned in another comment I think there's quite a high implementation burden for that. Since we both have to map to multiple, quite different APIs on the UR level, and both the graph runtime and UR implementations are very much not set up to support that kind of information sharing between graphs/command-buffers.

I think because of that burden if we were going to do that it would need to be done as a separate piece of work, so I would be in favour of leaving this design as is for now. It should be easy to update the spec to a new behaviour in the future, if we can commit to the implementation in the future.

Copy link

@guoyejun guoyejun May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graphs are primarily used for latency-sensitive workloads, right? And here we mandate that if we want to use dynamic events as a dependency, we must update the graph every time it's submitted.

my comment from user perspective, we always like the static graph without change, but if we are unable to keep the graph always the same (due to use case), we would have to use the dynamic (parameters or events) to update the graph, such overhead is likely to be less than recording/creating the graph again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that it's a concern for many use-cases. Graphs are primarily used for latency-sensitive workloads, right? And here we mandate that if we want to use dynamic events as a dependency, we must update the graph every time it's submitted.

I was referring to your use-case in contrast to the use-case example that I gave which may not use these kind of dependencies between graphs, and thus would not need to use update for their use-case, rather than trying to imply that your use-case is the only one with such concerns.

I agree that this is a concern for most and is the primary driver behind using graphs. Ideally having to update, especially a small number of nodes/events, will not outweigh the gains in reduced host latency from using graphs so there would still be a net performance increase. The effectiveness of graphs as a whole is workload dependent though so this may not be true for all cases.

@tgiladraz
Copy link

@Bensuo In order to use dynamic_event for network dependencies, we need to be able to get a 'ze_event_handle_t'. Are dynamic_events translatable using sycl::get_native?

@Bensuo
Copy link
Collaborator Author

Bensuo commented Jun 3, 2024

@Bensuo In order to use dynamic_event for network dependencies, we need to be able to get a 'ze_event_handle_t'. Are dynamic_events translatable using sycl::get_native?

As of right now there is no explicit conversion from a dynamic_event to the native event type, but you could obtain the native event from the same sycl::event that you update the dynamic_event with, for example:

sycl::event SomeEvent = ExecGraph.get_event(SomeNode);

DynamicEvent.update(SomeEvent);

ze_event_handle_t NativeEvent = sycl::get_native(SomeEvent);

// Do something with NativeEvent

@Bensuo
Copy link
Collaborator Author

Bensuo commented Jun 10, 2024

Continuing discussion from here around barriers with dynamic events: intel#13253 (comment)

I think it probably makes sense to extend ext_oneapi_barrier to also take dynamic_events based on the use case described in those comments, as well as allowing depends_on to take dynamic events to achieve the same goal for the explicit API. This would mean that normal events and dynamic events havce to be passed to separate calls, but I'm not sure there's any great options for allowing to pass a mix of them in one call.

@gmlueck @al42and Does this sound reasonable to you?

@gmlueck
Copy link

gmlueck commented Jun 10, 2024

This would mean that normal events and dynamic events havce to be passed to separate calls, but I'm not sure there's any great options for allowing to pass a mix of them in one call.

Yes, that could make sense. Another option is to define an API that takes a parameter pack, where each parameter can be either event or dynamic_event. The pack API, though, would not be as flexible. With the API that takes a std::vector<event>, you can dynamically create a vector, where the number of elements isn't known at compile time. I'm not sure how often applications need this ability, though.

@al42and
Copy link

al42and commented Jun 11, 2024

This would mean that normal events and dynamic events havce to be passed to separate calls, but I'm not sure there's any great options for allowing to pass a mix of them in one call.

Perfectly ok for us; we anyway submit barriers with only one event. But even if we expand, there would be no problem submitting two barriers for dynamic and "normal" events.

@Bensuo
Copy link
Collaborator Author

Bensuo commented Jun 12, 2024

Yes, that could make sense. Another option is to define an API that takes a parameter pack, where each parameter can be either event or dynamic_event. The pack API, though, would not be as flexible. With the API that takes a std::vector<event>, you can dynamically create a vector, where the number of elements isn't known at compile time. I'm not sure how often applications need this ability, though.

I've made this change and ended up using both approaches: The depends_on property already uses a parameter pack for events so it made sense to me to leave that as is and just change the wording to also allow dynamic_event to be passed here. Whereas ext_oneapi_barrier() already uses std::vector for accepting events so I've stuck to that for those overloads.

graph edges between those nodes and the node being added.

* Passing SYCL events, including <<dynamic-event, Dynamic Events>>. If an event
represents a recorded node in the same graph, then a graph edge is created

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
represents a recorded node in the same graph, then a graph edge is created
represents a node in the same graph, then a graph edge is created

"recorded" may give a wrong impression that only for the recorded graph, not the graph created with explicit APIs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There currently isn't a way to get an event like this for nodes added explicitly, though you can go the other way and get a node for a given event from a recorded queue submission so this is probably a gap that should be addressed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't command_graph::get_event provide this ability?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't command_graph::get_event provide this ability?

Since get_event() can only be called on executable graphs and returns an event specifically representing an execution point in a single graph execution (and can be used largely as normal SYCL events), they are conceptually quite different from the events returned from recorded queue submissions.

Those have no backend events associated with them, have many restrictions on where they can be used and are only for use in creating dependencies between graph nodes.

event get_event(const node& node);
----
|Returns a SYCL event which represents the completion of node `node` which is
valid only for the next execution of the graph. This event can be used as a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why the event is valid only for the next graph execution. In graph.finalize(), the event is tied with a native event, and I think (at lease for most cases), the event is fixed after graph.finalize().

Or the actual purpose of this function is to get a chance to reset the fixed event?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYCL events don't have any concept of being reset, and if we introduced that it becomes unclear when events are reset if it happens automatically every time you submit a graph for execution.

This way it is explicit what an event represents. It is not necessarily true that the backend event will be reset by this function call, but that is the general idea.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessarily true that the backend event will be reset by this function call, but that is the general idea.

Does this mean that "the backend event will be reset by this function call" is the general idea in practice, but it is not a must? (From spec perspective, it does not define this function's implementation.)

Suppose one implementation chooses to reset the backend event for this function, and the event is not reused by other places, based on it, just a "thought experiment", the code below:

dynEvent.update(execGraphA.get_event(nodeA));
execGraphB.update(nodeB);

can be simplified with below code:

execGraphA.get_event(nodeA);

Is it correct for this special implementation? The reason is that the backend event used by execGraphA does not change (just the event status is changed), and so the two "update" functions can be removed for a certain implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my previous reply was poorly worded, I did not mean to imply that that is how the implementation will behave, and I don't think an implementation would or should allow your simplified code.

This is basically the same idea being discussed here if I'm not mistaken, and I would reiterate that is something we can explore in the future but not as part of this initial work.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the expected/typical implementation of this function to make the returned event only valid for the next execution of the graph, thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I don't really have an answer for that immediately, but I think you are right that this needs some further consideration so we are looking into it.

Copy link
Collaborator

@EwanC EwanC Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering this more, I'm concerned that get_event(node) may not be implementable on a CUDA backend. Since the executable graph has already been finalized then we can't add nodes to it, but to implement this ontop of CUDA-Graph I think we need to use cudaGraphAddEventRecordNode() - which would require some mechanism in the modifiable graph to say that a node is going to produce a sycl event.

EDIT: I've moved this comment to its own thread here #372 (comment) to avoid branching the discussion from performant usage on backend that can reset events.

Copy link
Collaborator

@EwanC EwanC Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if defining the semantics of dynamic_event like a counter-based event would help give a more optimized flow for the node-to-node only use case.

  • If a dynamic_event is created from a vanilla sycl queue submission. Counter only incremented once when sycl event backing it is complete.

  • If a dynamic_event is created for a node in a graph, then counter incremented on completion of each execution of the graph.

  • When waiting on a dynamic_event in a vanilla queue submission, waiting on value to increment from current value.

A user wouldn't need to manually call any reset methods or the update() calls currently needed for the node-to-node between graph case.

Edit: I've actually realized that to implement get_event(node) we may not need to actually update the signal event in a L0 command, but always use the underlying L0 event that already exist for sync-points (or create one if need be) wrapped in different ur events events.

This L0 event gets reset between executions anyway, So we may actually get this kind of counter based
dynamic_event semantics by default without having to go out our way to achieve it.

@guoyejun
Copy link

guoyejun commented Jun 14, 2024

I have an open for a possible new user case for the interaction between two graphs as below, the key is that workload_b() is executed on another hardware not managed by sycl. We'll use in order sycl queue and recording&replay APIs.

workload(sycl::queue q) 
{
    // events below is to represent a possible workload dependency
    event1 = workload_a(q);
    event2 = workload_b(q, depend: event1);       // workload_b() is executed on another hardware
    event3 = workload_c(q, depend: event1);
    workload_d(q, depend: {event2, event3});
}

main()
{
    syclgraph g;
    sycl::queue q(inroder());
    g.begin_recording();
    workload(q);
    g.end_recording();
    execg = g.finalize();
    execg.replay();
}

In workload_b(), it can check if q is in recording status, if yes, it can start its own record for the workloads in workload_b, and it also has ability to understand event1 and create/update event2 (I'm not sure about the exact detail). So we will get two graphs: the main sycl graph, and the embedded non-sycl graph. I don't have idea on how to record and replay such case (with dynamic event?), any comment? thanks.

EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Aug 22, 2024
Expand the command-buffer experimental feature API so that
it can be used to implement [SYCL-Graph dynamic
events](reble/llvm#372).

This involves extending each command append entry-point to
include the following extra parameters:
  * An output `ur_exp_command_buffer_command_handle_t`.
  * An Input `ur_event_handle_t` event wait-list of dependent events.
  * An output `ur_event_handle_t` event that is signaled when the command
    completes its next execution.

New entry-points are also added to update the wait-list and signal event
parameters of commands:
  * `urCommandBufferUpdateSignalEventExp`
  * `urCommandBufferUpdateWaitEventsExp`

APIs implemented for CUDA adapter with CTS tests.
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Aug 22, 2024
Expand the command-buffer experimental feature API so that
it can be used to implement [SYCL-Graph dynamic
events](reble/llvm#372).

This involves extending each command append entry-point to
include the following extra parameters:
  * An output `ur_exp_command_buffer_command_handle_t`.
  * An Input `ur_event_handle_t` event wait-list of dependent events.
  * An output `ur_event_handle_t` event that is signaled when the command
    completes its next execution.

New entry-points are also added to update the wait-list and signal event
parameters of commands:
  * `urCommandBufferUpdateSignalEventExp`
  * `urCommandBufferUpdateWaitEventsExp`

APIs implemented for CUDA adapter with CTS tests.
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Aug 22, 2024
Expand the command-buffer experimental feature API so that
it can be used to implement [SYCL-Graph dynamic
events](reble/llvm#372).

This involves extending each command append entry-point to
include the following extra parameters:
  * An output `ur_exp_command_buffer_command_handle_t`.
  * An Input `ur_event_handle_t` event wait-list of dependent events.
  * An output `ur_event_handle_t` event that is signaled when the command
    completes its next execution.

New entry-points are also added to update the wait-list and signal event
parameters of commands:
  * `urCommandBufferUpdateSignalEventExp`
  * `urCommandBufferUpdateWaitEventsExp`

APIs implemented for CUDA adapter with CTS tests.
- Rename limited graph events to graph-limited events
- Clarify language of external dependencies in graphs
- Remove usage of "internal edges"
These events represent only the most recent execution of a given executable
graph. If an application executes the same graph multiple times before
scheduling work or performing a host-side wait on the event then executions of
the node in a previous execution other than the most recent one may be missed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the long sentence, does this talk about the code like below?

sycl::event ExecutionEvent = ExecGraph.get_event(SomeNode);

for (int i = 0; i < 1000; ++i)  // 1000 is just one value for example
    Queue.ext_oneapi_graph(ExecGraph);

Queue.submit((sycl::handler& CGH)
{
    CGH.depends_on(ExecutionEvent);
    CGH.parallel_for(...);
});

what is "Queue.ext_oneapi_graph(ExecGraph); with i==998" missed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After having some internal discussions with regards to implementability of this aspect of the spec I've pushed some wording to clarify what these events represent, but I'll also address your specific example.

As it stands right now if multiple SYCL graph submissions are in-flight as in your example, it is undefined at the point where you call CGH.depends_on(ExecutionEvent); which exact execution it will depend on, as it depends which iteration is currently executing on device.

To depend on a specific execution the application must enforce ordering such that multiple enqueues are not in-flight, like so (example I just added taken from the spec):

sycl::event NodeExecutionEvent;
sycl::event GraphCompletionEvent;

for (size_t i = 0; i < GraphIterations; i++){
  // Obtain the node execution event for the graph
  NodeExecutionEvent = ExecGraph.get_event(SomeNode);

  // Enqueue the graph for execution
  sycl::event GraphCompletionEvent = Queue.ext_oneapi_graph(ExecGraph);
  // Use the event, to wait on the host and perform some intermediate
  // host-work once that node has completed
  ExecutionEvent.wait();
  DoSomethingOnHost();
  // Wait on the graph finishing to ensure it is complete before the next use of
  // a node-level execution event
  GraphCompletionEvent.wait_and_throw();
}

This is generally in keeping with how the backends handle this but slightly more restrictive because SYCL events do not function like CUDA/L0 events and the user does not have actual visibility of device execution because this is obfuscated by the SYCL runtime.


==== Adding External Event Dependencies To Graphs [[external-event-dependencies]]

<<event-terminology, Regular SYCL events>> can be passed as dependencies to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regular SYCL events include:

  1. normal submissions to a SYCL queue
  2. events returned from submitting an executable command_graph for execution
  3. events obtained via command_graph<graph_state::executable>::get_event()

as for item 3, These events can then be waited on or used as dependencies for eager SYCL operations **outside** of graphs, so can item 3 be used as dependencies to graph nodes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is explicitly disallowed in the spec, dependencies between graph nodes should be established using graph-limited events.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the answer is no, I think we can not just use "Regular SYCL events" here since not all the three parts are all allowed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, I've updated the wording here to refer to events from eager SYCL submissions only.

EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Aug 29, 2024
Expand the command-buffer experimental feature API so that
it can be used to implement [SYCL-Graph dynamic
events](reble/llvm#372).

This involves extending each command append entry-point to
include the following extra parameters:
  * An output `ur_exp_command_buffer_command_handle_t`.
  * An Input `ur_event_handle_t` event wait-list of dependent events.
  * An output `ur_event_handle_t` event that is signaled when the command
    completes its next execution.

New entry-points are also added to update the wait-list and signal event
parameters of commands:
  * `urCommandBufferUpdateSignalEventExp`
  * `urCommandBufferUpdateWaitEventsExp`

APIs implemented for CUDA adapter with CTS tests.
- Inter-graph dependencies must have source graph finalized before creation
- Graphs with inter-graph dependencies can only be finalized once
- Update usage guide example on inter-graph dependencies
- Various updates to improve error coverage of method definitions.
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Aug 30, 2024
Expand the command-buffer experimental feature API so that
it can be used to implement [SYCL-Graph dynamic
events](reble/llvm#372).

This involves extending each command append entry-point to
include the following extra parameters:
  * An output `ur_exp_command_buffer_command_handle_t`.
  * An Input `ur_event_handle_t` event wait-list of dependent events.
  * An output `ur_event_handle_t` event that is signaled when the command
    completes its next execution.

New entry-points are also added to update the wait-list and signal event
parameters of commands:
  * `urCommandBufferUpdateSignalEventExp`
  * `urCommandBufferUpdateWaitEventsExp`

APIs implemented for CUDA adapter with CTS tests.
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Sep 2, 2024
Expand the command-buffer experimental feature API so that
it can be used to implement [SYCL-Graph dynamic
events](reble/llvm#372).

This involves extending each command append entry-point to
include the following extra parameters:
  * An output `ur_exp_command_buffer_command_handle_t`.
  * An Input `ur_event_handle_t` event wait-list of dependent events.
  * An output `ur_event_handle_t` event that is signaled when the command
    completes its next execution.

New entry-points are also added to update the wait-list and signal event
parameters of commands:
  * `urCommandBufferUpdateSignalEventExp`
  * `urCommandBufferUpdateWaitEventsExp`

APIs implemented for CUDA adapter with CTS tests.
@Bensuo
Copy link
Collaborator Author

Bensuo commented Sep 17, 2024

Closing this in favor of the upstream PR. Further discussion can happen there: intel#15056

@Bensuo Bensuo closed this Sep 17, 2024
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Sep 17, 2024
Expand the command-buffer experimental feature API so that
it can be used to implement [SYCL-Graph dynamic
events](reble/llvm#372).

This involves extending each command append entry-point to
include the following extra parameters:
  * An output `ur_exp_command_buffer_command_handle_t`.
  * An Input `ur_event_handle_t` event wait-list of dependent events.
  * An output `ur_event_handle_t` event that is signaled when the command
    completes its next execution.

New entry-points are also added to update the wait-list and signal event
parameters of commands:
  * `urCommandBufferUpdateSignalEventExp`
  * `urCommandBufferUpdateWaitEventsExp`

APIs implemented for CUDA adapter with CTS tests.
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Sep 18, 2024
Expand the command-buffer experimental feature API so that
it can be used to implement [SYCL-Graph dynamic
events](reble/llvm#372).

This involves extending each command append entry-point to
include the following extra parameters:
  * An output `ur_exp_command_buffer_command_handle_t`.
  * An Input `ur_event_handle_t` event wait-list of dependent events.
  * An output `ur_event_handle_t` event that is signaled when the command
    completes its next execution.

New entry-points are also added to update the wait-list and signal event
parameters of commands:
  * `urCommandBufferUpdateSignalEventExp`
  * `urCommandBufferUpdateWaitEventsExp`

APIs implemented for CUDA adapter with CTS tests.
EwanC added a commit to Bensuo/unified-runtime that referenced this pull request Sep 18, 2024
Expand the command-buffer experimental feature API so that
it can be used to implement [SYCL-Graph dynamic
events](reble/llvm#372).

This involves extending each command append entry-point to
include the following extra parameters:
  * An output `ur_exp_command_buffer_command_handle_t`.
  * An Input `ur_event_handle_t` event wait-list of dependent events.
  * An output `ur_event_handle_t` event that is signaled when the command
    completes its next execution.

New entry-points are also added to update the wait-list and signal event
parameters of commands:
  * `urCommandBufferUpdateSignalEventExp`
  * `urCommandBufferUpdateWaitEventsExp`

APIs implemented for CUDA adapter with CTS tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Graph Specification Extension Specification related
Projects
None yet
Development

Successfully merging this pull request may close these issues.