Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PI CUDA ERROR when using sycl::atomic_ref #11208

Closed
guoci opened this issue Sep 18, 2023 · 4 comments · Fixed by #12516
Closed

PI CUDA ERROR when using sycl::atomic_ref #11208

guoci opened this issue Sep 18, 2023 · 4 comments · Fixed by #12516
Assignees
Labels
bug Something isn't working confirmed cuda CUDA back-end

Comments

@guoci
Copy link

guoci commented Sep 18, 2023

Describe the bug
A clear and concise description of what the bug is.
I get an error when running the program below.

PI CUDA ERROR:
	Value:           719
	Name:            CUDA_ERROR_LAUNCH_FAILED
	Description:     unspecified launch failure
	Function:        wait
	Source Location: /root/intel-llvm-mirror/sycl/plugins/cuda/pi_cuda.cpp:653

terminate called after throwing an instance of 'sycl::_V1::runtime_error'
  what():  Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)

To Reproduce
Please describe the steps to reproduce the behavior:

  1. Include code snippet as short as possible
#include <sycl/sycl.hpp>
int main() {
    int i = 0;
    sycl::queue q{sycl::gpu_selector_v};
    sycl::buffer<int> buf_mod{&i, 1};
    q.submit([&](sycl::handler &h) {
        sycl::accessor mod{buf_mod, h, sycl::write_only};
        h.single_task([=] {
            sycl::atomic_ref<int, sycl::memory_order::seq_cst,
                    sycl::memory_scope::device, sycl::access::address_space::global_space> a{mod[0]};
            ++a;
        });
    }).wait_and_throw();
    std::cout << i << std::endl;
}
  1. Specify the command which should be used to compile the program
    clang++ -O3 -DNDEBUG main.cpp -fsycl -fsycl-targets=nvptx64-nvidia-cuda
  2. Specify the comment which should be used to launch the program
    ./a.out
  3. Indicate what is wrong and what was expected
    program should run without errors.
    Environment (please complete the following information):
  • OS: [e.g Windows/Linux]
    Linux

  • Target device and vendor: [e.g. Intel GPU]
    Nvidia GPU

  • DPC++ version: [e.g. commit hash or output of clang++ --version]
    Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0 (2023.2.0.20230622)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix
    InstalledDir: /opt/intel/oneapi/compiler/2023.2.0/linux/bin-llvm

  • Dependencies version: [e.g. low-level runtime versions (like NEO 20.04)]

Additional context
Add any other context about the problem here.
If the sycl::atomic_ref is using sycl::memory_order::acq_rel or sycl::memory_order::relaxed, then it runs without errors.

@guoci guoci added the bug Something isn't working label Sep 18, 2023
@bader bader added the cuda CUDA back-end label Sep 18, 2023
@JackAKirk
Copy link
Contributor

seq_cst isn't implemented in the cuda backend atm. However there should at least be a proper error saying this.

Is this feature important to you?

@guoci
Copy link
Author

guoci commented Oct 2, 2023

I am fine without it.
On a side note, sycl::memory_order::acq_rel works although it seems that CUDA only has relaxed atomics.

@JackAKirk
Copy link
Contributor

I am fine without it. On a side note, sycl::memory_order::acq_rel works although it seems that CUDA only has relaxed atomics.

cuda supports non relaxed atomics for sm70 and above: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model

The only difference is seq_cst, which doesn't have the same level of native support, but can apparently be implemented via fences.

Here is a summary I wrote:

The sycl::memory_order parameter corresponds with the ptx .sem qualifier that can be used on all atomic operations:

"The .sem qualifier requires sm_70 or higher. It specifies a memory synchronizing effect as described in the Memory Consistency Model. If the .sem qualifier is absent, .relaxed is assumed by default."

There is a memory-order correspondence between all the possible values of the .sem qualifier and those of sycl::memory_order except that sycl::memory_order::seq_cst is not supported by ptx:

sycl::memory_order::relaxed  → .sem = .relaxed
sycl::memory_order::acquire  →  .sem = .acquire
sycl::memory_order::release  →  .sem = .release
sycl::memory_order::acq_rel → .sem = .acq_rel

@JackAKirk
Copy link
Contributor

I've mapped sycl seq_cst to cuda backend. Details here: #12516 (comment)

ldrumm pushed a commit that referenced this issue Mar 18, 2024
Implement `seq_cst` RC11/ptx6.0 memory consistency for CUDA backend.

See https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 and
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model
for full details. Requires sm_70 or above. With this PR there is now a
complete mapping between SYCL memory consistency model capabilities and
the official CUDA model, fully exploiting CUDA capabilities when
possible on supported arches.

This makes the SYCL-CTS atomic_ref tests fully pass for sm_70 on the
cuda backend.

Fixes #11208

Depends on #12907

---------

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
kbenzie pushed a commit to kbenzie/llvm that referenced this issue Apr 15, 2024
Implement `seq_cst` RC11/ptx6.0 memory consistency for CUDA backend.

See https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 and
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model
for full details. Requires sm_70 or above. With this PR there is now a
complete mapping between SYCL memory consistency model capabilities and
the official CUDA model, fully exploiting CUDA capabilities when
possible on supported arches.

This makes the SYCL-CTS atomic_ref tests fully pass for sm_70 on the
cuda backend.

Fixes intel#11208

Depends on intel#12907

---------

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed cuda CUDA back-end
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants