[SYCL][Graph] 3D kernel update regression test #373

EwanC · 2024-06-03T09:13:28Z

Add an E2E regression test for updating kernel nodes with 3 dimensions. Test contains a graph with two nodes, the first node with an NDRange containing a user specified local size, and the second node containing a Range with implementation determined local size.

This PR helps to avoid hardcoded path symbols and let OS dependent tools to work with paths. Original version produces `path/....\lib\xptifw.lib` which causes skip of library linkage on windows. `os.path.normpath` is the main change here that helps to handle path separators properly. Signed-off-by: Tikhomirova, Kseniya <kseniya.tikhomirova@intel.com>

…e CPU (intel#13469) Testing for oneapi-src/unified-runtime#1527

…el#14011) This PR migrates changes to `get_version`, `get_major_version` and `get_minor_version`, adds new free function equivalents which accept a `sycl::device` argument, and adds relevant tests and documentation. --------- Signed-off-by: Joe Todd <joe.todd@codeplay.com>

…#13987) Bump UR L0 commit to oneapi-src/unified-runtime#1694 so that the SYCL device aspect for supporting update in graphs is correctly reported for L0 devices. Currently, support can be incorrectly reported. --------- Co-authored-by: Kenneth Benzie (Benie) <k.benzie83@gmail.com>

Running the `Graph/Update` E2E tests on Level Zero with `UR_L0_LEAKS_DEBUG=1` shows that we are leaking a PI kernel and module. On investigation this was because we are retaining these objects in `getOrCreateKernel()` but not releasing them. Added release calls similar to how it is done in [enqueueImpCommandBufferKernel](https://github.com/intel/llvm/blob/b49303c7e13ca0a69454eaaaeb8c3d094916218d/sycl/source/detail/scheduler/commands.cpp#L2550) by the scheduler

This upgrades the docker to use the cuda 12.5 image. I've ran the test-e2e locally using cuda 12.5 and all is well. cuda 12.5 also fixed an issue introduced by the cuda 12.4 driver: see intel#13661 (comment) Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

…lock_load/slm_block_store APIs accepting simd_view (intel#13978) Co-authored-by: Nick Sarnie <sarnex@users.noreply.github.com>

… build libdevice with thinLTO (intel#14036) This is the first change in my work on thinLTO for SYCL. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

Scheduled igc dev drivers uplift Co-authored-by: GitHub Actions <actions@github.com>

See intel#14053 Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

…mbiguity (intel#14018) This change avoids the ambiguity between the deprecated `sycl::ext::oneapi::sub_group` and `sycl::sub_group` when both namespaces are used. This fixes a failure on windows for cuda. --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

…ate for local_accessors API accepting simd_view (intel#14024)

intel#14000) Co-authored-by: Nick Sarnie <sarnex@users.noreply.github.com>

…el#14061) See intel#14060 --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

Supported matrix dimensions are queried from the device, and inform the tests which tile sizes one can use. This is a subset of all tests that are planned to be modified. Test manually tested on PVC and SPR --------- Co-authored-by: Yury Plyakhin <yury.plyakhin@intel.com>

Supported matrix dimensions are queried from the device, and inform the tests which tile sizes one can use. This is a subset of all tests that are planned to be modified. Test manually tested on PVC and SPR - no new regresssions The following tests have been marked as XFAIL on all platforms. I removed them from XMX8 folder. Once they are passing then they can be modified to query the supported matrix dimensions form the device. * joint_matrix_colA_rowB_colC.cpp * joint_matrix_out_bounds.cpp * joint_matrix_unaligned_k.cpp

…ntel#14067) Co-authored-by: Ewan Crawford <ewan.cr@gmail.com>

By some reason, we used to only emit unused member functions if they are explicitly annotated with `sycl_device` attribute (through `SYCL_EXTERNAL` macro). This logic was introduced in 3baec18 and there is no clear indication as to why exactly we have a check that the attribute is explicit. SYCL extension for virtual functions introduces an alternative markup for specifying which function and that markup is SYCL compile-time properties that we turn into attributes implicitly under the hood. Essentially, we now have a situation where an implicit `sycl_device` attribute on a member function should be treated as an explicit one, because it could be a result of SYCL compile-time property being applied to that method. Considering our current codebase, it seems like we intend to have member function to be emitted in all cases where `sycl_device` is being implicitly added and therefore this patch removes the requirement for the attribute to be explicit.

This PR adds functionalities for: * Listing devices in stdout * Filtering devices Tests and docs updated accordingly. --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com> Co-authored-by: Joe Todd <joe.todd@codeplay.com>

…intel#14015) This PR adds a `wait_and_free` func. This makes it safer and less likely to release memory during or before it is used by enqueued commands. `async_free` is renamed `enqueue_free`, to make its behaviour clearer This PR updates the comments and tests accordingly

…d to run on Windows (intel#13957) [Windows doesn't support cudaMemPrefetchAsync()](bitsandbytes-foundation/bitsandbytes#453) which is used in the call to `prefetch` in the test. [urEnqueueUSMPrefetch](https://github.com/oneapi-src/unified-runtime/blob/c0c607c3a88933b4c5c20a0aca4539781c678411/source/adapters/cuda/enqueue.cpp#L1629) is also commented with a note for not having the support for CUDA on Windows.

temp fix for problems from cuda 12.5 uplift that were caused by intel#14049. Should fix intel#14071 --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

…e exception message (intel#14055) - C++ thrown exception message not shown when running from Windows terminal. - The patch fixes [cuda-max-local-mem-size.cpp](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Plugin/cuda-max-local-mem-size.cpp) test failure.

… GPUs through OpenCL. (intel#14072) Extend the `sycl-ls-gpu-default.cpp` test to cover the support of Intel GPUs through OpenCL. The patch fixes the failure when running the test on a system with Intel and CUDA gpus.

Fixes ansi-alias violation and reads from uninitialized buffers. Fixes intel#13790.

…UDA (intel#14058) Fails in Nightly testing on the self-hosted CUDA runner: intel#12995.

…date ACC API accepting simd_view (intel#14065) Co-authored-by: Nick Sarnie <sarnex@users.noreply.github.com>

…date USM API accepting simd_view (intel#14043)

…#14087) See intel#14086 Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

…ntel#14088) CODEOWNERS seems to be missing a line attributing `sycl/test/check_device_code/matrix` tests to intel/sycl-matrix-reviewers (As per [this discussion](intel#14063 (comment))). This PR remedies this. Although, I noticed the current CODEOWNERS section for the matrix reviewers uses paths; Let me know if I should use `sycl/**/matrix` instead.

The runner seems to be broken, don't run the tests until it's fixed.

This PR 1. Updates extension to add math builtins for `vec<bfloat16>` and corresponding swizzles. 2. Implements **unoptimized** support for `vec<bloat16>`/swizzles in math builtins and adds a test case for the same. 3. Adds a test to check the device code generated for `vec<bloat16>` math builtins. I will make a follow-up PR to optimize `vec<bfloat16>` math builtins. I think we can use elementwise builtins for `ext_vector_type` (https://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors) to optimize `vec<bfloat16>` math builtins. The device code test case will help visualizing/reviewing math builtin optimizations.

…l#13943) Changed CUDA sycl/test/check_device_code lit test cases to use SYCL_EXTERNAL functions instead of submitting kernels to the queue everytime.

Bindless Images should now properly work on Windows, with the exception of Vulkan interop, which requires extra work. Required a few fixes to non-conformant C++ code.

This was tracked down to a bug in ROCm that seems to be fixed with newer versions, and the CI is now on ROCm 6+ so these should be fine. ROCm ticket: ROCm/clr#13 The reduce over group test works on W6800 and MI210, but it seems for gfx1031 it reports not supporting shared USM, note that HIP for gfx1031 isn't officially supported by AMD. --------- Co-authored-by: Steffen Larsen <steffen.larsen@intel.com>

Added SYCL Module Splitting as a library. ESIMD splitting is not present in this patch and will be added in an upcoming patch. Added particular testing tool sycl-module-split that invokes added functionality. Not all `device-code-split` tests were updated in this patch because the rest of them (mostly) don't test module splitting itself. They will be migrated in an upcoming patch.

This PR adds math `extend_v*2` operators _(18 in total)_ along with unit-tests for signed and unsigned `int32` cases. --------- Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com> Co-authored-by: Joe Todd <joe.todd@codeplay.com>

…intel#13536)

…ntel#14017) intel#13512 implemented the sycl_ext_oneapi_enqueue_functions extension. Following this, the corresponding extension document is moved to experimental and the feature test macro is defined. --------- Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>

…14081) We don't actually need cpp20, this was left over from a previous iteration of the test. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

(fixed issues with post commit testing) --------- Co-authored-by: jason1.li <jasonli@ubuntu-server.jf.intel.com>

…13974) After intel#13486, aspect name information is visible in `sycl-post-link` without the use of `!sycl_aspects`, so this PR updates `sycl-post-link` to use the aspect names that are now available within the `!sycl_used_aspects` metadata instead of `!sycl_aspects`. Aditionally, this PR also adds E2E related to optional kernel features for AOT enabled by these changes

Add a proposed extension specification for `work_group_memory`, a lighter weight API to allocate device local memory for an nd-range kernel. Also related, add a list of restrictions that, when followed, provide a guarantee that a kernel written in the free-function kernel syntax can be launched directly via Level Zero or OpenCL.

…intel#14084)

… in the SPEC (intel#13947) Follow-up of and blocked by: intel#13945 For `vec<std::byte>`, we only allow math operations that are valid on `std::byte` itself. (https://en.cppreference.com/w/cpp/types/byte)

…13975) Currently `std::byte*` scratch pointer is not aligned and `reinterpret_cast`ed as `T*` where type `T` may have alignment requirement different from `byte*`, this is UB. As a solution, use `std::align` to align the required buffer in the scratch and use placement `new` so that dynamic type of the buffer in the scratch will be `T*`.

…el#14098)

This change is intended to fix a CI failure on the OpenCL backend for the `memory_management_test3`. This is because the memory is allocated on a different q to the one it is released on.

the address pointer should be the USM pointer

Add an E2E regression test for updating kernel nodes with 3 dimensions. Test contains a graph with two nodes, the first node with an NDRange containing a user specified local size, and the second node containing a Range with implementation determined local size.

EwanC · 2024-06-10T08:08:54Z

Closing in favor of upstream PR intel#14110 since this test is a confirmed reproducer

KseniyaTikhomirova and others added 30 commits June 4, 2024 09:15

[SYCL] [NATIVECPU] Report correct memory order capabilities for Nativ…

18c4fb2

…e CPU (intel#13469) Testing for oneapi-src/unified-runtime#1527

[UR] Bump CUDA tag to 0e38fda0 (intel#14030)

20991b1

[UR] Bump HIP tag to 399430da (intel#14037)

0cec128

[ESIMD] Allow full autodeduction for block_load/block_store and slm_b…

643d21f

…lock_load/slm_block_store APIs accepting simd_view (intel#13978) Co-authored-by: Nick Sarnie <sarnex@users.noreply.github.com>

[SYCL] Allow specifying -foffload-lto with the new offload driver and…

7ff1a29

… build libdevice with thinLTO (intel#14036) This is the first change in my work on thinLTO for SYCL. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

[GHA] Uplift Linux IGC Dev RT version to igc-dev-8b999ec (intel#13963)

fbeb7d4

Scheduled igc dev drivers uplift Co-authored-by: GitHub Actions <actions@github.com>

[SYCL][E2E] Disable flaky profiling_queue.cpp test on CUDA (intel#14054)

aa92b24

See intel#14053 Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

[ESIMD] Allow full autodeduction for slm_atomic_update and atomic_upd…

2dc80c3

…ate for local_accessors API accepting simd_view (intel#14024)

[ESIMD] Allow full autodeduction for prefetch APIs accepting simd_view (

c7d627f

intel#14000) Co-authored-by: Nick Sarnie <sarnex@users.noreply.github.com>

[SYCL] Disable flaky test EnqueueNoMemObjTwoHostTasks on Windows (int…

037c67f

…el#14061) See intel#14060 --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

[SYCL][Graph] Clarify graph enable_profiling property in finalize() (i…

09c9384

…ntel#14067) Co-authored-by: Ewan Crawford <ewan.cr@gmail.com>

[CI] pre-commit/aws pointed back to old image. (intel#14074)

f4829ab

temp fix for problems from cuda 12.5 uplift that were caused by intel#14049. Should fix intel#14071 --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

[SYCL][E2E] Refactor/fix bfloat16 test (intel#14062)

463f00c

Fixes ansi-alias violation and reads from uninitialized buffers. Fixes intel#13790.

[SYCL][E2E] Disable NonUniformGroups/ballot_group_algorithms.cpp on C…

0f796bc

…UDA (intel#14058) Fails in Nightly testing on the self-hosted CUDA runner: intel#12995.

[SYCL][E2E] Remove warnings in Basic e2e tests (intel#13994)

0cbc9a0

[Doc] Add Mar'24 Release Notes (intel#13879)

b8693eb

fineg74 and others added 25 commits June 6, 2024 20:38

[ESIMD] Allow full autodeduction of template parameters for atomic_up…

d66106c

…date ACC API accepting simd_view (intel#14065) Co-authored-by: Nick Sarnie <sarnex@users.noreply.github.com>

[ESIMD] Allow full autodeduction of template parameters for atomic_up…

353cc51

…date USM API accepting simd_view (intel#14043)

[SYCL][E2E] Disable memory_management_test3.cpp on Gen12 linux (intel…

f8552d4

…#14087) See intel#14086 Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

[CI] Don't run E2E tests on self-hosted CUDA in Nightly (intel#14041)

e51a90a

The runner seems to be broken, don't run the tests until it's fixed.

[SYCL] Change check_device_code CUDA tests to use SYCL_EXTERNAL (inte…

3d9ded6

…l#13943) Changed CUDA sycl/test/check_device_code lit test cases to use SYCL_EXTERNAL functions instead of submitting kernels to the queue everytime.

[SYCL][Bindless] Enable non-Vulkan tests on Windows (intel#14045)

4c2cbc5

Bindless Images should now properly work on Windows, with the exception of Vulkan interop, which requires extra work. Required a few fixes to non-conformant C++ code.

[ESIMD]Replace use of vc intrinsic with spirv extension for rdtsc API (…

990b1d1

…intel#13536)

[SYCL][ESIMD][E2E] Fix bit shift vector test to not use c++20 (intel#…

cadd800

…14081) We don't actually need cpp20, this was left over from a previous iteration of the test. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

[SYCL][ESIMD] Instruction count performance test (intel#14033)

6d591f1

(fixed issues with post commit testing) --------- Co-authored-by: jason1.li <jasonli@ubuntu-server.jf.intel.com>

[SYCL][E2E] Remove uses of OpenCL primitives in Basic/image e2e tests (…

4fcc744

…intel#14084)

[SYCL] Fix post-commit issue with library dependencies (intel#14094)

0e587ed

[SYCL] Restrict sycl::vec and swizzle operations to types mentioned…

4222b4c

… in the SPEC (intel#13947) Follow-up of and blocked by: intel#13945 For `vec<std::byte>`, we only allow math operations that are valid on `std::byte` itself. (https://en.cppreference.com/w/cpp/types/byte)

[SYCL][E2E] Remove use of deprecated exceptions in USM e2e tests (int…

51a061d

…el#14098)

[SYCL][COMPAT] Fix memory_management_test3 (intel#14080)

d6a18e1

This change is intended to fix a CI failure on the OpenCL backend for the `memory_management_test3`. This is because the memory is allocated on a different q to the one it is released on.

[SYCL][Graph] fix the address pointer in graph print (intel#13595)

a35f862

the address pointer should be the USM pointer

EwanC force-pushed the ewan/update_3d_regression branch from 3d8edb2 to 4e7f2dc Compare June 10, 2024 08:07

EwanC closed this Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][Graph] 3D kernel update regression test #373

[SYCL][Graph] 3D kernel update regression test #373

EwanC commented Jun 3, 2024

EwanC commented Jun 10, 2024

[SYCL][Graph] 3D kernel update regression test #373

[SYCL][Graph] 3D kernel update regression test #373

Conversation

EwanC commented Jun 3, 2024

EwanC commented Jun 10, 2024