PI CUDA ERROR when using `sycl::atomic_ref` #11208

guoci · 2023-09-18T17:33:26Z

Describe the bug
A clear and concise description of what the bug is.
I get an error when running the program below.

PI CUDA ERROR:
	Value:           719
	Name:            CUDA_ERROR_LAUNCH_FAILED
	Description:     unspecified launch failure
	Function:        wait
	Source Location: /root/intel-llvm-mirror/sycl/plugins/cuda/pi_cuda.cpp:653

terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)

To Reproduce
Please describe the steps to reproduce the behavior:

Include code snippet as short as possible

#include <sycl/sycl.hpp>
int main() {
    int i = 0;
    sycl::queue q{sycl::gpu_selector_v};
    sycl::buffer<int> buf_mod{&i, 1};
    q.submit([&](sycl::handler &h) {
        sycl::accessor mod{buf_mod, h, sycl::write_only};
        h.single_task([=] {
            sycl::atomic_ref<int, sycl::memory_order::seq_cst,
                    sycl::memory_scope::device, sycl::access::address_space::global_space> a{mod[0]};
            ++a;
        });
    }).wait_and_throw();
    std::cout << i << std::endl;
}

Specify the command which should be used to compile the program
clang++ -O3 -DNDEBUG main.cpp -fsycl -fsycl-targets=nvptx64-nvidia-cuda
Specify the comment which should be used to launch the program
./a.out
Indicate what is wrong and what was expected
program should run without errors.
Environment (please complete the following information):

OS: [e.g Windows/Linux]
Linux
Target device and vendor: [e.g. Intel GPU]
Nvidia GPU
DPC++ version: [e.g. commit hash or output of clang++ --version]
Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0 (2023.2.0.20230622)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2023.2.0/linux/bin-llvm
Dependencies version: [e.g. low-level runtime versions (like NEO 20.04)]

Additional context
Add any other context about the problem here.
If the sycl::atomic_ref is using sycl::memory_order::acq_rel or sycl::memory_order::relaxed, then it runs without errors.

The text was updated successfully, but these errors were encountered:

JackAKirk · 2023-10-02T14:59:05Z

seq_cst isn't implemented in the cuda backend atm. However there should at least be a proper error saying this.

Is this feature important to you?

guoci · 2023-10-02T17:12:51Z

I am fine without it.
On a side note, sycl::memory_order::acq_rel works although it seems that CUDA only has relaxed atomics.

JackAKirk · 2023-10-02T17:19:20Z

I am fine without it. On a side note, sycl::memory_order::acq_rel works although it seems that CUDA only has relaxed atomics.

cuda supports non relaxed atomics for sm70 and above: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model

The only difference is seq_cst, which doesn't have the same level of native support, but can apparently be implemented via fences.

Here is a summary I wrote:

The sycl::memory_order parameter corresponds with the ptx .sem qualifier that can be used on all atomic operations:

"The .sem qualifier requires sm_70 or higher. It specifies a memory synchronizing effect as described in the Memory Consistency Model. If the .sem qualifier is absent, .relaxed is assumed by default."

There is a memory-order correspondence between all the possible values of the .sem qualifier and those of sycl::memory_order except that sycl::memory_order::seq_cst is not supported by ptx:

sycl::memory_order::relaxed  → .sem = .relaxed
sycl::memory_order::acquire  →  .sem = .acquire
sycl::memory_order::release  →  .sem = .release
sycl::memory_order::acq_rel → .sem = .acq_rel

JackAKirk · 2024-01-29T16:37:46Z

I've mapped sycl seq_cst to cuda backend. Details here: #12516 (comment)

Implement `seq_cst` RC11/ptx6.0 memory consistency for CUDA backend. See https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 and https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model for full details. Requires sm_70 or above. With this PR there is now a complete mapping between SYCL memory consistency model capabilities and the official CUDA model, fully exploiting CUDA capabilities when possible on supported arches. This makes the SYCL-CTS atomic_ref tests fully pass for sm_70 on the cuda backend. Fixes #11208 Depends on #12907 --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Implement `seq_cst` RC11/ptx6.0 memory consistency for CUDA backend. See https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 and https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model for full details. Requires sm_70 or above. With this PR there is now a complete mapping between SYCL memory consistency model capabilities and the official CUDA model, fully exploiting CUDA capabilities when possible on supported arches. This makes the SYCL-CTS atomic_ref tests fully pass for sm_70 on the cuda backend. Fixes intel#11208 Depends on intel#12907 --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

guoci added the bug Something isn't working label Sep 18, 2023

bader added the cuda CUDA back-end label Sep 18, 2023

JackAKirk added the confirmed label Oct 6, 2023

JackAKirk self-assigned this Jan 12, 2024

JackAKirk mentioned this issue Jan 29, 2024

[CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 #12516

Merged

ldrumm closed this as completed in #12516 Mar 18, 2024

kbenzie mentioned this issue Apr 15, 2024

sycl-rel_5_2_0: [CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 (#12516) #13403

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PI CUDA ERROR when using `sycl::atomic_ref` #11208

PI CUDA ERROR when using `sycl::atomic_ref` #11208

guoci commented Sep 18, 2023

JackAKirk commented Oct 2, 2023

guoci commented Oct 2, 2023

JackAKirk commented Oct 2, 2023

JackAKirk commented Jan 29, 2024

PI CUDA ERROR when using sycl::atomic_ref #11208

PI CUDA ERROR when using sycl::atomic_ref #11208

Comments

guoci commented Sep 18, 2023

JackAKirk commented Oct 2, 2023

guoci commented Oct 2, 2023

JackAKirk commented Oct 2, 2023

JackAKirk commented Jan 29, 2024

PI CUDA ERROR when using `sycl::atomic_ref` #11208

PI CUDA ERROR when using `sycl::atomic_ref` #11208