-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] Blocking wait in finalize on scheduler node dependencies #335
Conversation
@Bensuo Could you check if this patch works on CUDA, which we added the original asynchronous wait for |
Ben pulled this change into our open upstream CUDA PR intel#11133 and the CI passed |
17e89c6
to
35456ac
Compare
35456ac
to
b0333a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
b0333a1
to
0f92801
Compare
When command-groups uses accessors in SYCL-Graph nodes, our implementation uses the scheduler to wrangle dependencies and then add the node to the graph. This happens at the application level during graph finalization. We currently have a call to `waitForEvents()` when adding nodes, that waits on the device event dependencies of that node command-group. For example, a memory copy command. However, this corresponds to a call to `piEnqueueWaitForEvents`(), which is an asynchronous call that itself returns an event that must be waited on later. The blocking wait on this returned event is invoked by the scheduler on enqueue of the executable graph. This patch changes this behaviour to perform blocking waits earlier, on node addition to the PI/UR command-buffer during graph finalization. We have designed `finalize()` as the computationally expensive entry-point in the extension spec, so blocking behaviour makes sense here rather than on executable graph submission. There are two categories of blocking wait done when the scheduler adds a node to the graph: 1. Blocking wait on host event dependencies, these correspond to scheduler commands that don't have an associated PiEnqueue call. For example, memory allocation commands in the scheduler. Adding this wait fixes the E2E failures on DG2 devices, so the tests are re-enabled. 2. Blocking wait on device event dependencies, these correspond to scheduler commands that do have an associated PiEnqueue call that returns an event. These can occur when regular queue submissions are interleaved with adding graph nodes. Introducing this wait fixes fails in the `buffer_ordering.cpp` E2E test on some Level Zero devices.
0f92801
to
6f5064d
Compare
Closing this, and moving it to an upstream PR intel#11706 I also checked if these changes are still required after intel#11691, and looks like they are. |
When command-groups uses accessors our SYCL-Graph implementation uses the scheduler to wrangle dependencies and add nodes to the graph. This happens during graph finalization. We currently have a call to
waitForEvents
when adding nodes, to wait on the device events dependencies of that node command-group. For example, a memory copy command.However, this corresponds to
piEnqueueWaitForEvents()
which is an asynchronous call, and itself returns an event that must be waited on later. The blocking wait on this returned event is invoked by the scheduler on enqueue of the executable graph.This patch changes this behaviour to perform blocking waits earlier, on node enqueue during graph finalization. We have specified
finalize()
as the computationally expensive entry-point, so blocking behaviour makes sense here rather than on graph submission.There are two categories of blocking wait done here:
Blocking wait on host event dependencies, these correspond to scheduler commands that don't have an associated PiEnqueue call. For example, memory allocation commands in the scheduler. Adding this wait fixes the E2E failures on DG2 devices, so the tests are re-enabled.
Blocking wait on device event dependencies, these correspond to scheduler commands that do have an associated PiEnqueue call that returns an event. These can occur when regular queue submissions are interleaved with adding graph nodes. Introducing this wait fixes fails in the
buffer_ordering.cpp
E2E test on some Level Zero devices.Note I'd open this PR upstream once the standalone UR bump PR is merged