diff --git a/sycl/doc/design/CommandGraph.md b/sycl/doc/design/CommandGraph.md index 3e7885ab6022..f36c40af0740 100644 --- a/sycl/doc/design/CommandGraph.md +++ b/sycl/doc/design/CommandGraph.md @@ -438,23 +438,23 @@ Level Zero: Future work will include exploring L0 API extensions to improve the mapping of UR command-buffer to L0 command-list. -#### Copy engine +#### Copy Engine -For performance considerations, Unified-Runtime uses different Level-zero -command-queues to submit compute kernels and memory operations when the device -has a dedicated copy engine. To take advantage of the copy engine when -available, the Graph workload can also be split between memory operations and -compute kernels. To achieve this, two Graph workload command-lists live -simultaneously in a command-buffer. +For performance considerations, the Unified Runtime Level Zero adapter uses +different Level Zero command-queues to submit compute kernels and memory +operations when the device has a dedicated copy engine. To take advantage of the +copy engine when available, the graph workload can also be split between memory +operations and compute kernels. To achieve this, two graph workload +command-lists live simultaneously in a command-buffer. -When the command-buffer is finalized, memory operations (e.g. Buffer copy, +When the command-buffer is finalized, memory operations (e.g. buffer copy, buffer fill, ...) are enqueued in the *copy* command-list while the other -commands are enqueued in the general command-list. On submission, if not empty, -the *copy* command-list is sent to the main copy command-queue while the general +commands are enqueued in the compute command-list. On submission, if not empty, +the *copy* command-list is sent to the main copy command-queue while the compute command-list is sent to the compute command-queue. Both are executed concurrently. Synchronization between the command-lists is -handled by Level-Zero events. +handled by Level Zero events. ### CUDA