[CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 #12516

JackAKirk · 2024-01-29T14:48:24Z

Implement seq_cst RC11/ptx6.0 memory consistency for CUDA backend.

See https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 and https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model for full details. Requires sm_70 or above. With this PR there is now a complete mapping between SYCL memory consistency model capabilities and the official CUDA model, fully exploiting CUDA capabilities when possible on supported arches.

This makes the SYCL-CTS atomic_ref tests fully pass for sm_70 on the cuda backend.

Fixes #11208

Depends on #12907

see https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 for all details Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk · 2024-01-29T15:12:55Z

This is ready for review. I've marked this as draft so this doesn't get merged, since the UR tag is only temporary for testing: this PR requires oneapi-src/unified-runtime#1291

Some further information:

The Repaired C++11 memory consistency model (RC11 https://pure.mpg.de/rest/items/item_2543045/component/file_3332084/content ) was adopted in C++20 definition of seq_cst: see https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r5.html
Nvidia explicitly state how their ptx instructions map to RC11 here (Figure 11) : https://dl.acm.org/doi/pdf/10.1145/3297858.3304043

This PR implements this mapping.
Read "4.2 A Mapping from Scoped C++ onto PTX" for an explanation of the RMW mapping from ptx 6.0 to RC11.

The "ptx 6.0" memory model described in that paper is described as the ptx memory consistency model in their official ptx documentation https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model

CUDA memory consistency patterns falling outside the scope of "ptx 6.0", apart from its "ptx 7.5" extension (https://dl.acm.org/doi/pdf/10.1145/3470496.3533045), are not properly defined anywhere.

libclc/ptx-nvidiacl/libspirv/atomic/atomic_helpers.h

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Alcpz

SYCLcompat changes look good to me. Thank you for fixing those tests.

ldrumm

Your PR description will become the squashed commit message. Please reword it so it better reads as one:

The title should be clearer: "Implement CRC11 seq_cst for PTX6"

The wording should use imperative mood:

s/this PR implements/Implement

s/With this PR//g

There's also a merge conflict that needs to be resolved

sycl/plugins/unified_runtime/CMakeLists.txt

sycl/test-e2e/syclcompat/atomic/atomic_class.cpp

JackAKirk · 2024-02-02T12:18:43Z

Bindless images failure is unrelated. This PR just needs oneapi-src/unified-runtime#1291 merged and the UR tag updated. Then it will be ready for merge. I'll unmark it as draft when that happens.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

kbenzie

oneapi-src/unified-runtime#1291 has been merged, however this PR is dependent on #12907 merging first. Once that's happened pull in the latest sycl branch changes, resolve the conflict, then update the UR repo/tag as suggested. After that make this ready for review and UR reviewers approve it.

sycl/plugins/unified_runtime/CMakeLists.txt

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

sycl/plugins/unified_runtime/CMakeLists.txt

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Implement `seq_cst` RC11/ptx6.0 memory consistency for CUDA backend. See https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 and https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model for full details. Requires sm_70 or above. With this PR there is now a complete mapping between SYCL memory consistency model capabilities and the official CUDA model, fully exploiting CUDA capabilities when possible on supported arches. This makes the SYCL-CTS atomic_ref tests fully pass for sm_70 on the cuda backend. Fixes intel#11208 Depends on intel#12907 --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

…ces (#12974) AMD ~~and CUDA~~ devices still not supported. ~~CUDA to be supported in #12516 Edit: Since #12516 has been merged, CUDA is also `seq_cst` by default.

JackAKirk added 5 commits January 29, 2024 05:39

ptx 6.0 seq_cst memory consistency model Impl.

b605993

see https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 for all details Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Point to UR branch for testing.

a2321dc

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Merge branch 'sycl' into cuda-atom-seq-cst

e0aa6e1

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Added missing arch check.

5d7bfab

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Added missing arch check.

d8e0002

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk requested review from a team as code owners January 29, 2024 14:48

JackAKirk requested a review from npmiller January 29, 2024 14:48

JackAKirk marked this pull request as draft January 29, 2024 14:51

Updated draft UR tag for CI testing.

b161136

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk mentioned this pull request Jan 29, 2024

[CUDA] Report that devices with cc >= sm_70 support seq_cst oneapi-src/unified-runtime#1291

Merged

JackAKirk temporarily deployed to WindowsCILock January 29, 2024 15:20 — with GitHub Actions Inactive

JackAKirk temporarily deployed to WindowsCILock January 29, 2024 15:51 — with GitHub Actions Inactive

JackAKirk mentioned this pull request Jan 29, 2024

PI CUDA ERROR when using sycl::atomic_ref #11208

Closed

GeorgeWeb reviewed Jan 31, 2024

View reviewed changes

libclc/ptx-nvidiacl/libspirv/atomic/atomic_helpers.h Show resolved Hide resolved

JackAKirk added 2 commits January 31, 2024 09:33

Merge branch 'sycl' into cuda-atom-seq-cst

164a0cc

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Fix syclcompat tests.

8e418e7

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk closed this Jan 31, 2024

JackAKirk reopened this Jan 31, 2024

JackAKirk marked this pull request as ready for review January 31, 2024 20:13

JackAKirk requested a review from a team as a code owner January 31, 2024 20:13

Alcpz approved these changes Jan 31, 2024

View reviewed changes

ldrumm requested changes Feb 1, 2024

View reviewed changes

sycl/plugins/unified_runtime/CMakeLists.txt Outdated Show resolved Hide resolved

sycl/test-e2e/syclcompat/atomic/atomic_class.cpp Outdated Show resolved Hide resolved

Merge branch 'sycl' into cuda-atom-seq-cst

a231611

JackAKirk temporarily deployed to WindowsCILock February 1, 2024 09:13 — with GitHub Actions Inactive

JackAKirk marked this pull request as draft February 1, 2024 09:13

JackAKirk temporarily deployed to WindowsCILock February 1, 2024 09:32 — with GitHub Actions Inactive

JackAKirk mentioned this pull request Feb 1, 2024

[syclcompat][CUDA] FIX UB in test / seq_cst requires sm_70 on CUDA. #12575

Merged

Alcpz mentioned this pull request Mar 11, 2024

[SYCL][COMPAT] nd_range barriers seq_cst by default in supported devices #12974

Merged

JackAKirk added 3 commits March 15, 2024 06:49

Point to testing branch.

f61f271

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Merge branch 'sycl' into cuda-atom-seq-cst

1ed016f

Fix merge conflict.

296ad12

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk temporarily deployed to WindowsCILock March 15, 2024 11:00 — with GitHub Actions Inactive

JackAKirk temporarily deployed to WindowsCILock March 15, 2024 11:40 — with GitHub Actions Inactive

kbenzie reviewed Mar 18, 2024

View reviewed changes

sycl/plugins/unified_runtime/CMakeLists.txt Outdated Show resolved Hide resolved

Merge branch 'sycl' into cuda-atom-seq-cst

859ff2f

JackAKirk temporarily deployed to WindowsCILock March 18, 2024 14:20 — with GitHub Actions Inactive

JackAKirk marked this pull request as ready for review March 18, 2024 14:23

JackAKirk had a problem deploying to WindowsCILock March 18, 2024 14:43 — with GitHub Actions Error

Fix latest UR tag.

12b477c

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

kbenzie reviewed Mar 18, 2024

View reviewed changes

sycl/plugins/unified_runtime/CMakeLists.txt Show resolved Hide resolved

Set UR repo.

58689a9

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

kbenzie approved these changes Mar 18, 2024

View reviewed changes

JackAKirk temporarily deployed to WindowsCILock March 18, 2024 15:42 — with GitHub Actions Inactive

JackAKirk temporarily deployed to WindowsCILock March 18, 2024 16:04 — with GitHub Actions Inactive

JackAKirk changed the title ~~[CUDA][LIBCLC] RC11/ptx6.0 memory consistency model seq_cst impl~~ [CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 Mar 18, 2024

ldrumm approved these changes Mar 18, 2024

View reviewed changes

ldrumm merged commit c1e2957 into intel:sycl Mar 18, 2024
11 checks passed

kbenzie mentioned this pull request Apr 15, 2024

sycl-rel_5_2_0: [CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 (#12516) #13403

Closed

JackAKirk mentioned this pull request Sep 25, 2024

[SYCL][CUDA]Massive runfail for CUDA SYCL CTS on unrelated commit #8556

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 #12516

[CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 #12516

JackAKirk commented Jan 29, 2024 •

edited by kbenzie

Loading

JackAKirk commented Jan 29, 2024 •

edited

Loading

Alcpz left a comment

ldrumm left a comment

JackAKirk commented Feb 2, 2024

kbenzie left a comment

[CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 #12516

[CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 #12516

Conversation

JackAKirk commented Jan 29, 2024 • edited by kbenzie Loading

JackAKirk commented Jan 29, 2024 • edited Loading

Alcpz left a comment

Choose a reason for hiding this comment

ldrumm left a comment

Choose a reason for hiding this comment

JackAKirk commented Feb 2, 2024

kbenzie left a comment

Choose a reason for hiding this comment

JackAKirk commented Jan 29, 2024 •

edited by kbenzie

Loading

JackAKirk commented Jan 29, 2024 •

edited

Loading