Skip to content

Commit

Permalink
Minor wording changes in design doc
Browse files Browse the repository at this point in the history
  • Loading branch information
Bensuo committed Jun 4, 2024
1 parent cfeac1d commit 4a47e54
Showing 1 changed file with 11 additions and 11 deletions.
22 changes: 11 additions & 11 deletions sycl/doc/design/CommandGraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -438,23 +438,23 @@ Level Zero:
Future work will include exploring L0 API extensions to improve the mapping of
UR command-buffer to L0 command-list.

#### Copy engine
#### Copy Engine

For performance considerations, Unified-Runtime uses different Level-zero
command-queues to submit compute kernels and memory operations when the device
has a dedicated copy engine. To take advantage of the copy engine when
available, the Graph workload can also be split between memory operations and
compute kernels. To achieve this, two Graph workload command-lists live
simultaneously in a command-buffer.
For performance considerations, the Unified Runtime Level Zero adapter uses
different Level Zero command-queues to submit compute kernels and memory
operations when the device has a dedicated copy engine. To take advantage of the
copy engine when available, the graph workload can also be split between memory
operations and compute kernels. To achieve this, two graph workload
command-lists live simultaneously in a command-buffer.

When the command-buffer is finalized, memory operations (e.g. Buffer copy,
When the command-buffer is finalized, memory operations (e.g. buffer copy,
buffer fill, ...) are enqueued in the *copy* command-list while the other
commands are enqueued in the general command-list. On submission, if not empty,
the *copy* command-list is sent to the main copy command-queue while the general
commands are enqueued in the compute command-list. On submission, if not empty,
the *copy* command-list is sent to the main copy command-queue while the compute
command-list is sent to the compute command-queue.

Both are executed concurrently. Synchronization between the command-lists is
handled by Level-Zero events.
handled by Level Zero events.

### CUDA

Expand Down

0 comments on commit 4a47e54

Please sign in to comment.