[SYCL][Graph] Update design doc for copy queue #362

mfrancepillois · 2024-03-15T15:24:43Z

Update the design doc as command-buffer now uses the copy queue if a copy engine is available.

EwanC

nitpicked the design doc but LGTM

sycl/doc/design/CommandGraph.md

EwanC

Can we add an E2E test for a linear graph that interleaves memory and kernel commands. I'd like to verify that the interaction between our in-order command-list optimization and compute/copy command-lists works correctly.

Bensuo · 2024-06-03T14:15:19Z

Can we add an E2E test for a linear graph that interleaves memory and kernel commands. I'd like to verify that the interaction between our in-order command-list optimization and compute/copy command-lists works correctly.

I have added a test locally but actually the way it is right now the copy command list optimization will only happen for out-of-order command buffers. It might make sense to leave that as it is for now and work on combining these two optimizations if worth it as a further piece of work?

sycl/test-e2e/Graph/ValidUsage/linear_graph_l0_copy.cpp

…4143) When passing along multiple targets in the form of -fsycl-targets=intel_gpu_dg1,intel_gpu_pvc, the number of the device compilations was n*n as opposed to just n. Due to how we were handling duplicate entries for toolchain generation, the different names used even though they had the same target triple (spir64_gen) we being considered as unique, causing the multiple entries. This is the second attempt to push this one in, updated the sycl-offload-new-driver.c test to reflect ordering issues encountered.

…l targets (intel#14102) clang-linker-wrapper is not target-specific. i.e. it is not called for a single target device. It is called only once. Currently, clang-linker-wrapper is called only with device images with spir64 targets. So, the existing approach to capture the first target triple in the list of triples and use it for gathering sycl-device-library files is valid. As we plan to add support for more targets (AOT), we need to gather sycl-device-libraries for all targets. This PR addresses this change. Also, the triple should not be passed to the linker wrapper. The linker wrapper should get the triples from device images. Thanks --------- Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>

Signed-off-by: jinge90 <ge.jin@intel.com>

) For testing oneapi-src/unified-runtime#1282 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

…ntel#14138) Required for specific use-cases in SYCLomatic. --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

The `Graph/UnsupportedDevice/device_query.cpp` test asserts that L0 devices will never have full graph support. This is not the case, depending on the L0 device and driver version full graphs support is possible. Update the test to remove asserting on this, as diving into these details is out of the scope of the test. This was previously decided when discussion how to check the OpenCL backend for similar possible variances in aspect support.

`cuda_dev_kit` is not set properly in [test-e2e/lit.cfg.py](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/lit.cfg.py) due to invalid CUDA paths. Fixing the paths showed errors in [14115](intel#14115) and [14116](intel#14116) which are XFAILed. The patch fixes the failure of [cuda_queue_priority.cpp](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Plugin/cuda_queue_priority.cpp) on Windows / CUDA.

oneapi-src/unified-runtime#1128 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

This PR adds math `extend_v*4` operators (18 in total) along with unit-tests for signed and unsigned int32 cases. *Some changes overlap with the previous `extend_v*2` PR intel#13953 and thus should be reviewed/merged first. --------- Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com> Co-authored-by: Joe Todd <joe.todd@codeplay.com> Co-authored-by: Yihan Wang <yihan.wang@intel.com>

…l#13807) It was propagated to `getOrCreateAllocaForReq` when creating a new record, but then no commands are expected to be enqueued there since the first alloca for a record cannot exceed its leaf limit or be linked to another alloca.

Running the test on Windows failed due to missing support of `ls`. Replacing `ls` with `cat` made the test pass on Windows.

…l#14120) - `InorderQueue/in_order_get_property.cpp` -> Use non-deprecated `sycl::exception`, add check for errc to ensure we are still catching the correct exception - `InorderQueue/in_order_kernels.cpp` -> Use group `get_group_id` function instead of deprecated `get_id` - `InorderQueue/in_order_usm_implicit.cpp` -> Use queue `mem_advice` function that uses `int` instead of `pi_mem_advice`

Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

oneapi-src/unified-runtime#1711 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

Currently Level Zero plugin uses loader and headers fetched by the Level Zero adapter (LevelZeroLoader-Headers, LevelZeroLoader targets). Currently downloaded loader code is not used, only headers are used for xpti. So, get headers location from LevelZeroLoader-Headers target instead and remove unnecessary code.

Add group_key_value_sorter sorters and sort_key_value_over_group APIs based on https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_group_sort.asciidoc extension. This PR was split out from larger PR: intel#13713 Co-authored-by: "Andrei Fedorov [andrey.fedorov@intel.com](mailto:andrey.fedorov@intel.com)" Co-authored-by: "Romanov Vlad [vlad.romanov@intel.com](mailto:vlad.romanov@intel.com)"

…ntel#14129) A single basic file to compile and run to test functionality of --offload-new-driver --------- Co-authored-by: Marcos Maronas <maarquitos14@users.noreply.github.com>

…d ConvertFToBFloat16INTEL (intel#14085) This PR adds vector overloads of `ConvertBFloat16ToFINTEL` and `ConvertFToBFloat16INTEL` to libdevice (SPEC: https://spec.oneapi.io/level-zero/latest/core/SPIRV.html#bfloat16-conversions) and a wrapper around it (`BF16VecToFloatVec` and `FloatVecToBF16Vec`) in `ext::oneapi::detail`. These overloads are intended to optimize BFloat16 `marray`, `vec` operations, for which we currently do element-by-element `bfloat16 -> float -> bfloat16` conversions.

…4130) Replaces intel#13270 Changing the storage to std::array instead of Clang's extension fixes strict ansi-aliasing violation and simplifies device code.

[SYCL] Adding support for missing math ops: - truncf - sinpif - rsqrtf - exp10f

Add section to the contribution guide detailing the current process for integrating Unified Runtime updates into DPC++.

…ntel#14123) Current implementation of profiling info for NOP barriers is inconsistent with other events from the same queue (e.g., if the previous event started after the barrier was submitted). To make them consistent while keeping the optimization, we would need to duplicate the event on our side and make the duplicate check and potentially use profiling info of its previous event. Instead, as the first step, disable the NOP optimization during profiling since profiling is known to incur a performance hit anyway. The proper duplicate event approach can be implemented as a follow up if this causes issues for users. Partially reverts intel#12949

atomic_ref<T *> uses 64-bit atomics and it should be decorated with the corresponding aspect. fixes: intel#12743

I'm observing cache overflow when running heavy tests on OCL backend with gpu. Clear cache in case of PI_ERROR_OUT_OF_HOST_MEMORY as well as for PI_ERROR_OUT_OF_RESOURCES. Using as reference: intel#11987

Fixed by KhronosGroup/SYCL-CTS#895

Scheduled igc dev drivers uplift Co-authored-by: GitHub Actions <actions@github.com>

The function is using the `operator=` before it's defined which can cause some build failures: ``` build/include/sycl/ext/oneapi/bfloat16.hpp:98:19: error: no match for ‘operator=’ (operand types are ‘sycl::_V1::ext::oneapi::bfloat16’ and ‘float’) 98 | dst[i] = src[i]; | ^ ``` Moving it after the bfloat16 class definition fixes it.

Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/micromatch/braces/commit/74b2db2938fad48a2ea54a9c8bf27a37a62c350d"><code>74b2db2</code></a> 3.0.3</li> <li><a href="https://github.com/micromatch/braces/commit/88f1429a0f47e1dd3813de35211fc97ffda27f9e"><code>88f1429</code></a> update eslint. lint, fix unit tests.</li> <li><a href="https://github.com/micromatch/braces/commit/415d660c3002d1ab7e63dbf490c9851da80596ff"><code>415d660</code></a> Snyk js braces 6838727 (<a href="https://redirect.github.com/micromatch/braces/issues/40">#40</a>)</li> <li><a href="https://github.com/micromatch/braces/commit/190510f79db1adf21d92798b0bb6fccc1f72c9d6"><code>190510f</code></a> fix tests, skip 1 test in test/braces.expand</li> <li><a href="https://github.com/micromatch/braces/commit/716eb9f12d820b145a831ad678618731927e8856"><code>716eb9f</code></a> readme bump</li> <li><a href="https://github.com/micromatch/braces/commit/a5851e57f45c3431a94d83fc565754bc10f5bbc3"><code>a5851e5</code></a> Merge pull request <a href="https://redirect.github.com/micromatch/braces/issues/37">#37</a> from coderaiser/fix/vulnerability</li> <li><a href="https://github.com/micromatch/braces/commit/2092bd1fb108d2c59bd62e243b70ad98db961538"><code>2092bd1</code></a> feature: braces: add maxSymbols (<a href="https://github.com/micromatch/braces/issues/">https://github.com/micromatch/braces/issues/</a>...</li> <li><a href="https://github.com/micromatch/braces/commit/9f5b4cf47329351bcb64287223ffb6ecc9a5e6d3"><code>9f5b4cf</code></a> fix: vulnerability (<a href="https://security.snyk.io/vuln/SNYK-JS-BRACES-6838727">https://security.snyk.io/vuln/SNYK-JS-BRACES-6838727</a>)</li> <li><a href="https://github.com/micromatch/braces/commit/98414f9f1fabe021736e26836d8306d5de747e0d"><code>98414f9</code></a> remove funding file</li> <li><a href="https://github.com/micromatch/braces/commit/665ab5d561c017a38ba7aafd92cc6655b91d8c14"><code>665ab5d</code></a> update keepEscaping doc (<a href="https://redirect.github.com/micromatch/braces/issues/27">#27</a>)</li> <li>Additional commits viewable in <a href="https://github.com/micromatch/braces/compare/3.0.2...3.0.3">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=braces&package-manager=npm_and_yarn&previous-version=3.0.2&new-version=3.0.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/intel/llvm/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…tly (intel#12872) This PR refactors the builtin fence helper macro for AMDGPU to take in and process the order semantic explicitly because that is the only semantic argument accepted by the amdgcn builtin. Additionally, makes the `None` (Monotonic) order semantic which maps to C++/SYCL's `relaxed` to be a no-op instead of falling back to the previous `acq_rel` default order. --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie83@gmail.com>

Closes intel#7330.

The rotate functions are technically c++20 and MSVC hasn't implemented them yet. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

…ntel#14151) One of the models that is used for specifying the device architecture for spir64_gen is to use the -Xsycl-target-backend "-device arg" syntax on the command line. Hook up the ability to scan the target backend values to embed the proper information in the packaged binary when using the new offload model.

This PR adds math `extend_vcompare[2/4] `operators (4 in total) along with unit-tests for signed and unsigned int32 cases. Also, Unit-tests from previous `extend_v*4` intel#14078 and `extend_v*2` intel#13953 are moved to two different files. --------- Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com> Co-authored-by: Joe Todd <joe.todd@codeplay.com> Co-authored-by: Yihan Wang <yihan.wang@intel.com>

…ntel#14150) pre-commit PR for oneapi-src/unified-runtime#1749 --------- Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com> Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

intel#14162) Instead of using old device selector objects, use SYCL 2020 device selector callables to construct devices in `FilterSelector` e2e tests.

- Update UR tag to include L0 command-buffer copy engine optimization - Add test which mixes copy and kernel commands - Update design doc to detail copy engine optimization

Co-authored-by: Kenneth Benzie (Benie) <k.benzie83@gmail.com>

mfrancepillois added the Graph Implementation Related to DPC++ implementation and testing label Mar 15, 2024

mfrancepillois requested review from EwanC, ori-sky, reble, Bensuo and julianmi March 15, 2024 15:24

mfrancepillois mentioned this pull request Mar 15, 2024

[EXP][Command-Buffer] Support for using copy queue Bensuo/unified-runtime#10

Closed

Bensuo force-pushed the maxime/update-doc-copy-queue branch 2 times, most recently from 7739b83 to d5773eb Compare May 28, 2024 14:49

EwanC approved these changes May 31, 2024

View reviewed changes

EwanC reviewed May 31, 2024

View reviewed changes

Bensuo force-pushed the maxime/update-doc-copy-queue branch from 73ba45c to 4a47e54 Compare June 4, 2024 16:21

EwanC reviewed Jun 4, 2024

View reviewed changes

sycl/test-e2e/Graph/ValidUsage/linear_graph_l0_copy.cpp Outdated Show resolved Hide resolved

Bensuo force-pushed the maxime/update-doc-copy-queue branch 2 times, most recently from eb67857 to 020a5ff Compare June 11, 2024 16:41

mdtoguchi and others added 14 commits June 11, 2024 18:43

[SYCL] Enable CET for wqlibsycl-devicelib-host.a (intel#14135)

c1b17e0

Signed-off-by: jinge90 <ge.jin@intel.com>

[UR] Fix size confusion for several device property queries (intel#12488

c168f21

) For testing oneapi-src/unified-runtime#1282 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

[SYCL][COMPAT] Added non-const image2d_max and image3d_max getters (i…

bdeb0ef

…ntel#14138) Required for specific use-cases in SYCLomatic. --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

[UR] Bump main tag to 78d02039 (intel#12269)

7c530e1

oneapi-src/unified-runtime#1128 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

[E2E] Modify commands to address running on Windows. (intel#13682)

bd33aaf

Running the test on Windows failed due to missing support of `ls`. Replacing `ls` with `cat` made the test pass on Windows.

[UR] Update UR tag to include L0 loader related changes (intel#14109)

1a885ec

Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

[UR] Bump main tag to b13c5e1f (intel#14042)

ae79b95

oneapi-src/unified-runtime#1711 --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

againull and others added 19 commits June 12, 2024 09:30

[SYCL][NewOffload][E2E] add a single test for --offload-new-driver (i…

fe8c284

…ntel#14129) A single basic file to compile and run to test functionality of --offload-new-driver --------- Co-authored-by: Marcos Maronas <maarquitos14@users.noreply.github.com>

[SYCL] Use std::array as storage for sycl::vec on device (intel#1…

e7defab

…4130) Replaces intel#13270 Changing the storage to std::array instead of Clang's extension fixes strict ansi-aliasing violation and simplifies device code.

[SYCL] Adding support for missing math ops (intel#14132)

9942378

[SYCL] Adding support for missing math ops: - truncf - sinpif - rsqrtf - exp10f

[Doc] Document Unified Runtime update process (intel#14097)

e34b7ff

Add section to the contribution guide detailing the current process for integrating Unified Runtime updates into DPC++.

[SYCL] Add atomic64 aspect decoration to atomic_ref<T *> (intel#14052)

da3b5df

atomic_ref<T *> uses 64-bit atomics and it should be decorated with the corresponding aspect. fixes: intel#12743

[SYCL] Clear cache in case of PI_ERROR_OUT_OF_HOST_MEMORY (intel#14119)

c342a78

I'm observing cache overflow when running heavy tests on OCL backend with gpu. Clear cache in case of PI_ERROR_OUT_OF_HOST_MEMORY as well as for PI_ERROR_OUT_OF_RESOURCES. Using as reference: intel#11987

[CI] Turn on sycl-cts/test_accessor in Nightly (intel#14159)

a5a36f8

Fixed by KhronosGroup/SYCL-CTS#895

[GHA] Uplift Linux IGC Dev RT version to igc-dev-480f8b6 (intel#14155)

957f762

Scheduled igc dev drivers uplift Co-authored-by: GitHub Actions <actions@github.com>

[SYCL] Re-enable Basic/barrier_order.cpp (intel#14154)

c2e5529

Closes intel#7330.

[SYCL][ESIMD][E2E] Fix rotate.cpp on Windows (intel#14152)

4e41992

The rotate functions are technically c++20 and MSVC hasn't implemented them yet. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

EwanC force-pushed the maxime/update-doc-copy-queue branch from 020a5ff to 0d13b58 Compare June 14, 2024 07:43

nrspruit and others added 4 commits June 14, 2024 14:45

[UR][L0] Maintain Lock of Queue while syncing the Last Command Event (i…

579484f

…ntel#14150) pre-commit PR for oneapi-src/unified-runtime#1749 --------- Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com> Co-authored-by: Kenneth Benzie (Benie) <k.benzie@codeplay.com>

[SYCL][E2E] Use callable device selector in FilterSelector e2e tests (

19052da

intel#14162) Instead of using old device selector objects, use SYCL 2020 device selector callables to construct devices in `FilterSelector` e2e tests.

[SYCL][Graph] Update design doc for copy optimization and add test

090c9aa

- Update UR tag to include L0 command-buffer copy engine optimization - Add test which mixes copy and kernel commands - Update design doc to detail copy engine optimization

Update sycl/plugins/unified_runtime/CMakeLists.txt

01b1582

Co-authored-by: Kenneth Benzie (Benie) <k.benzie83@gmail.com>

EwanC force-pushed the maxime/update-doc-copy-queue branch from 5d44bdf to 01b1582 Compare June 14, 2024 13:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][Graph] Update design doc for copy queue #362

[SYCL][Graph] Update design doc for copy queue #362

mfrancepillois commented Mar 15, 2024

EwanC left a comment

EwanC left a comment

Bensuo commented Jun 3, 2024

[SYCL][Graph] Update design doc for copy queue #362

Are you sure you want to change the base?

[SYCL][Graph] Update design doc for copy queue #362

Conversation

mfrancepillois commented Mar 15, 2024

EwanC left a comment

Choose a reason for hiding this comment

EwanC left a comment

Choose a reason for hiding this comment

Bensuo commented Jun 3, 2024