Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Blocking wait in finalize on scheduler node dependencies #335

Closed
wants to merge 1 commit into from

Conversation

EwanC
Copy link
Collaborator

@EwanC EwanC commented Oct 20, 2023

When command-groups uses accessors our SYCL-Graph implementation uses the scheduler to wrangle dependencies and add nodes to the graph. This happens during graph finalization. We currently have a call to waitForEvents when adding nodes, to wait on the device events dependencies of that node command-group. For example, a memory copy command.

However, this corresponds to piEnqueueWaitForEvents() which is an asynchronous call, and itself returns an event that must be waited on later. The blocking wait on this returned event is invoked by the scheduler on enqueue of the executable graph.

This patch changes this behaviour to perform blocking waits earlier, on node enqueue during graph finalization. We have specified finalize() as the computationally expensive entry-point, so blocking behaviour makes sense here rather than on graph submission.

There are two categories of blocking wait done here:

  1. Blocking wait on host event dependencies, these correspond to scheduler commands that don't have an associated PiEnqueue call. For example, memory allocation commands in the scheduler. Adding this wait fixes the E2E failures on DG2 devices, so the tests are re-enabled.

  2. Blocking wait on device event dependencies, these correspond to scheduler commands that do have an associated PiEnqueue call that returns an event. These can occur when regular queue submissions are interleaved with adding graph nodes. Introducing this wait fixes fails in the buffer_ordering.cpp E2E test on some Level Zero devices.

Note I'd open this PR upstream once the standalone UR bump PR is merged

@EwanC
Copy link
Collaborator Author

EwanC commented Oct 23, 2023

@Bensuo Could you check if this patch works on CUDA, which we added the original asynchronous wait for

@EwanC EwanC marked this pull request as draft October 23, 2023 08:48
@EwanC
Copy link
Collaborator Author

EwanC commented Oct 24, 2023

@Bensuo Could you check if this patch works on CUDA, which we added the original asynchronous wait for

Ben pulled this change into our open upstream CUDA PR intel#11133 and the CI passed

@EwanC EwanC marked this pull request as ready for review October 24, 2023 08:25
@EwanC EwanC added the Graph Implementation Related to DPC++ implementation and testing label Oct 24, 2023
@EwanC EwanC marked this pull request as draft October 24, 2023 14:32
@EwanC EwanC force-pushed the ewan/graph_scheduling_blocking_wait branch from 17e89c6 to 35456ac Compare October 24, 2023 20:43
@EwanC EwanC changed the title [SYCL][Graph] Blocking wait on scheduler node enqueue [SYCL][Graph] Blocking wait in finalize on scheduler node dependencies Oct 25, 2023
@EwanC EwanC force-pushed the ewan/graph_scheduling_blocking_wait branch from 35456ac to b0333a1 Compare October 25, 2023 12:53
@EwanC EwanC marked this pull request as ready for review October 25, 2023 15:38
Copy link
Owner

@reble reble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@mfrancepillois mfrancepillois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@EwanC EwanC force-pushed the ewan/graph_scheduling_blocking_wait branch from b0333a1 to 0f92801 Compare October 26, 2023 11:02
When command-groups uses accessors in SYCL-Graph nodes, our implementation
uses the scheduler to wrangle dependencies and then add the node to the graph.
This happens at the application level during graph finalization. We currently
have a call to `waitForEvents()` when adding nodes, that waits on the device
event dependencies of that node command-group. For example, a memory copy command.

However, this corresponds to a call to `piEnqueueWaitForEvents`(), which is an
asynchronous call that itself returns an event that must be waited on later. The
blocking wait on this returned event is invoked by the scheduler on enqueue of
the executable graph.

This patch changes this behaviour to perform blocking waits earlier, on node
addition to the PI/UR command-buffer during graph finalization. We have designed
`finalize()` as the computationally expensive entry-point in the extension spec,
so blocking behaviour makes sense here rather than on executable graph submission.

There are two categories of blocking wait done when the scheduler adds a
node to the graph:

1. Blocking wait on host event dependencies, these correspond to scheduler
commands that don't have an associated PiEnqueue call. For example, memory
allocation commands in the scheduler. Adding this wait fixes the E2E failures
on DG2 devices, so the tests are re-enabled.

2. Blocking wait on device event dependencies, these correspond to scheduler
commands that do have an associated PiEnqueue call that returns an event. These
can occur when regular queue submissions are interleaved with adding graph nodes.
Introducing this wait fixes fails in the `buffer_ordering.cpp` E2E test on some
Level Zero devices.
@EwanC EwanC force-pushed the ewan/graph_scheduling_blocking_wait branch from 0f92801 to 6f5064d Compare October 30, 2023 10:52
@EwanC
Copy link
Collaborator Author

EwanC commented Oct 30, 2023

Closing this, and moving it to an upstream PR intel#11706 I also checked if these changes are still required after intel#11691, and looks like they are.

@EwanC EwanC closed this Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Graph Implementation Related to DPC++ implementation and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants