Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not possible to use some math functions with the CUDA and HIP backends #5326

Open
krasznaa opened this issue Jan 17, 2022 · 10 comments
Open
Labels
bug Something isn't working cuda CUDA back-end enhancement New feature or request hip Issues related to execution on HIP backend.

Comments

@krasznaa
Copy link
Contributor

Describe the bug

This may have been reported already, I just couldn't easily find an existing ticket about it... When I try to use the atan2 and std::hypot functions in a kernel, I'm not able to build that kernel into "CUDA or HIP binaries".

To Reproduce

Take the following trivial example:

#include <CL/sycl.hpp>

#include <cmath>
#include <iostream>

int main() {

  sycl::queue queue;
  std::cout << "Running on device: "
	    << queue.get_device().get_info<sycl::info::device::name>()
	    << std::endl;

  static constexpr int ARRAY_SIZES = 100;
  sycl::buffer<float> a(ARRAY_SIZES), b(ARRAY_SIZES), c(ARRAY_SIZES);

  {
    auto acc_a = a.get_access<sycl::access::mode::write>();
    auto acc_b = b.get_access<sycl::access::mode::write>();
    for (int i = 0; i < ARRAY_SIZES; ++i) {
      acc_a[i] = 1.23f;
      acc_b[i] = 2.34f;
    }
  }

  queue.submit([&](::sycl::handler& h) {
		 auto acc_a = a.get_access<sycl::access::mode::read>(h);
		 auto acc_b = b.get_access<sycl::access::mode::read>(h);
		 auto acc_c = c.get_access<sycl::access::mode::write>(h);
		 h.parallel_for<class ATan2Test>(sycl::range<1>(ARRAY_SIZES),
						 [=](sycl::item<1> i) {
						   acc_c[i] = atan2f(acc_a[i], acc_b[i]);
						 });
	       });

  {
    static const float RESULT = atan2f(1.23f, 2.34f);
    auto acc_c = c.get_access<sycl::access::mode::read>();
    for (int i = 0; i < ARRAY_SIZES; ++i) {
      if (std::abs(acc_c[i] - RESULT) > 0.001) {
	std::cerr << "Result at index " << i << " is " << acc_c[i] << " instead of "
		  << RESULT << std::endl;
	return 1;
      }
    }
  }

  std::cout << "All OK!" << std::endl;
  return 0;
}

On Intel backends it seems to work fine.

[bash][atspot01]:oneAPICUDA > dpcpp -v
Intel(R) oneAPI DPC++/C++ Compiler 2022.0.0 (2022.0.0.20211123)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /atlas/software/intel/oneapi-2022.1.1/compiler/2022.0.1/linux/bin-llvm
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Candidate multilib: .;@m64
Selected multilib: .;@m64
[bash][atspot01]:oneAPICUDA > dpcpp ./atan2f.cpp 
[bash][atspot01]:oneAPICUDA > ./a.out 
Running on device: Intel(R) UHD Graphics 630 [0x3e98]
All OK!
[bash][atspot01]:oneAPICUDA > SYCL_DEVICE_FILTER=cpu ./a.out 
Running on device: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
All OK!
[bash][atspot01]:oneAPICUDA > SYCL_DEVICE_FILTER=host ./a.out 
Running on device: SYCL host device
All OK!
[bash][atspot01]:oneAPICUDA >

But if I try to compile this code with the CUDA or HIP backends, those just really don't want to play ball... 😦

[bash][Legolas]:sycl > clang++ -v
clang version 14.0.0 (https://github.com/intel/llvm.git bf5d9d58a4f650855ee14adda8b46a72fb17779d)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/krasznaa/software/intel/clang/nightly-20220115/x86_64-ubuntu2004-gcc9-opt/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64
Found CUDA installation: /home/krasznaa/software/cuda/11.4.3/x86_64-ubuntu2004, version 11.4
Found HIP installation: /opt/rocm, version 4.2.21155-37cb3a34
[bash][Legolas]:sycl > clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda atan2f.cpp
warning: linking module '/home/krasznaa/software/intel/clang/nightly-20220115/x86_64-ubuntu2004-gcc9-opt/lib/clang/14.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc': Linking two modules of different target triples: '/home/krasznaa/software/intel/clang/nightly-20220115/x86_64-ubuntu2004-gcc9-opt/lib/clang/14.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc' is 'nvptx64-unknown-nvidiacl' whereas 'atan2f.cpp' is 'nvptx64-nvidia-cuda'
 [-Wlinker-warnings]
1 warning generated.
ptxas fatal   : Unresolved extern function 'atan2f'
llvm-foreach: 
clang-14: error: ptxas command failed with exit code 255 (use -v to see invocation)
clang version 14.0.0 (https://github.com/intel/llvm.git bf5d9d58a4f650855ee14adda8b46a72fb17779d)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/krasznaa/software/intel/clang/nightly-20220115/x86_64-ubuntu2004-gcc9-opt/bin
clang-14: note: diagnostic msg: Error generating preprocessed source(s).
[bash][Legolas]:sycl > clang++ -fsycl -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend --offload-arch=gfx803 atan2f.cpp
warning: linking module '/home/krasznaa/software/intel/clang/nightly-20220115/x86_64-ubuntu2004-gcc9-opt/lib/clang/14.0.0/../../clc/remangled-l64-signed_char.libspirv-amdgcn--amdhsa.bc': Linking two modules of different target triples: '/home/krasznaa/software/intel/clang/nightly-20220115/x86_64-ubuntu2004-gcc9-opt/lib/clang/14.0.0/../../clc/remangled-l64-signed_char.libspirv-amdgcn--amdhsa.bc' is 'amdgcn-unknown-amdhsa' whereas 'atan2f.cpp' is 'amdgcn-amd-amdhsa'
 [-Wlinker-warnings]
1 warning generated.
lld: error: undefined hidden symbol: atan2f
>>> referenced by lto.tmp:(typeinfo name for cl::sycl::detail::__pf_kernel_wrapper<main::'lambda'(cl::sycl::handler&)::operator()(cl::sycl::handler&) const::ATan2Test>)
>>> referenced by lto.tmp:(typeinfo name for cl::sycl::detail::__pf_kernel_wrapper<main::'lambda'(cl::sycl::handler&)::operator()(cl::sycl::handler&) const::ATan2Test>)
>>> referenced by lto.tmp:(typeinfo name for main::'lambda'(cl::sycl::handler&)::operator()(cl::sycl::handler&) const::ATan2Test)
>>> referenced 1 more times
llvm-foreach: 
clang-14: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)
[bash][Legolas]:sycl >

Environment (please complete the following information):

  • OS: Ubuntu 20.04/18.04
  • Target device and vendor: NVIDIA and AMD GPUs
  • DPC++ version: see above
  • Dependencies version: N/A

Additional context

I guess the answer may very well be that these two functions (atan2 and std::hypot) are just things that have not been implemented for these backends yet. Which is fair. In that case this would be a feature request to implement them. 😄

Pinging @ivorobts, @konradkusiak97 and @beomki-yeo.

@krasznaa krasznaa added the bug Something isn't working label Jan 17, 2022
@alexbatashev alexbatashev added cuda CUDA back-end hip Issues related to execution on HIP backend. labels Jan 17, 2022
@zjin-lcf
Copy link
Contributor

could you call sycl::atan2 and sycl::hypot ?

@krasznaa
Copy link
Contributor Author

I don't believe that sycl::atan2 is declared... 🤔

[bash][Legolas]:sycl > clang++ -fsycl atan2f.cpp
atan2f.cpp:32:21: error: no member named 'atan2f' in namespace 'sycl'; did you mean simply 'atan2f'?
                                                   acc_c[i] = sycl::atan2f(acc_a[i], acc_b[i]);
                                                              ^~~~~~~~~~~~
                                                              atan2f
/home/krasznaa/software/intel/clang/nightly-20220115/x86_64-ubuntu2004-gcc9-opt/bin/../include/sycl/CL/sycl/builtins.hpp:1627:28: note: 'atan2f' declared here
extern SYCL_EXTERNAL float atan2f(float x, float y);
                           ^
atan2f.cpp:38:33: error: no member named 'atan2f' in namespace 'sycl'; did you mean simply 'atan2f'?
    static const float RESULT = sycl::atan2f(1.23f, 2.34f);
                                ^~~~~~~~~~~~
                                atan2f
/home/krasznaa/software/intel/clang/nightly-20220115/x86_64-ubuntu2004-gcc9-opt/bin/../include/sycl/CL/sycl/builtins.hpp:1627:28: note: 'atan2f' declared here
extern SYCL_EXTERNAL float atan2f(float x, float y);
                           ^
2 errors generated.
[bash][Legolas]:sycl >

The atan2f function is declared explicitly as you can see. That's why it also works fine on Intel backends. I believe it is "just" a matter of missing support for that function on the CUDA/HIP backends. (Though I don't know what that means technically...)

@krasznaa
Copy link
Contributor Author

Having said that, sycl::hypot does exist. And the following code even seems to work:

#include <CL/sycl.hpp>

#include <cmath>
#include <iostream>

int main() {

  sycl::queue queue;
  std::cout << "Running on device: "
	    << queue.get_device().get_info<sycl::info::device::name>()
	    << std::endl;

  static constexpr int ARRAY_SIZES = 100;
  sycl::buffer<float> a(ARRAY_SIZES), b(ARRAY_SIZES), c(ARRAY_SIZES);

  {
    auto acc_a = a.get_access<sycl::access::mode::write>();
    auto acc_b = b.get_access<sycl::access::mode::write>();
    for (int i = 0; i < ARRAY_SIZES; ++i) {
      acc_a[i] = 1.23f;
      acc_b[i] = 2.34f;
    }
  }

  queue.submit([&](::sycl::handler& h) {
		 auto acc_a = a.get_access<sycl::access::mode::read>(h);
		 auto acc_b = b.get_access<sycl::access::mode::read>(h);
		 auto acc_c = c.get_access<sycl::access::mode::write>(h);
		 h.parallel_for<class ATan2Test>(sycl::range<1>(ARRAY_SIZES),
						 [=](sycl::item<1> i) {
						   acc_c[i] = sycl::hypot(acc_a[i], acc_b[i]);
						 });
	       });

  {
    static const float RESULT = sycl::hypot(1.23f, 2.34f);
    auto acc_c = c.get_access<sycl::access::mode::read>();
    for (int i = 0; i < ARRAY_SIZES; ++i) {
      if (std::abs(acc_c[i] - RESULT) > 0.001) {
	std::cerr << "Result at index " << i << " is " << acc_c[i] << " instead of "
		  << RESULT << std::endl;
	return 1;
      }
    }
  }

  std::cout << "All OK!" << std::endl;
  return 0;
}
[bash][Legolas]:sycl > clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda atan2f.cpp
warning: linking module '/home/krasznaa/software/intel/clang/nightly-20220115/x86_64-ubuntu2004-gcc9-opt/lib/clang/14.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc': Linking two modules of different target triples: '/home/krasznaa/software/intel/clang/nightly-20220115/x86_64-ubuntu2004-gcc9-opt/lib/clang/14.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc' is 'nvptx64-unknown-nvidiacl' whereas 'atan2f.cpp' is 'nvptx64-nvidia-cuda'
 [-Wlinker-warnings]
1 warning generated.
[bash][Legolas]:sycl > SYCL_DEVICE_FILTER=cuda ./a.out 
Running on device: NVIDIA GeForce RTX 3080
All OK!
[bash][Legolas]:sycl >

And:

[bash][Legolas]:sycl > clang++ -fsycl -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend --offload-arch=gfx803 atan2f.cpp
warning: linking module '/home/krasznaa/software/intel/clang/nightly-20220115/x86_64-ubuntu2004-gcc9-opt/lib/clang/14.0.0/../../clc/remangled-l64-signed_char.libspirv-amdgcn--amdhsa.bc': Linking two modules of different target triples: '/home/krasznaa/software/intel/clang/nightly-20220115/x86_64-ubuntu2004-gcc9-opt/lib/clang/14.0.0/../../clc/remangled-l64-signed_char.libspirv-amdgcn--amdhsa.bc' is 'amdgcn-unknown-amdhsa' whereas 'atan2f.cpp' is 'amdgcn-amd-amdhsa'
 [-Wlinker-warnings]
1 warning generated.
[bash][Legolas]:sycl > SYCL_DEVICE_FILTER=hip ./a.out 
Running on device: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
All OK!
[bash][Legolas]:sycl >

So while being able to use std::hypot would be more convenient (as we declare these trigonometric calculations in code that doesn't know about SYCL), with that function we could at least work around the issue for now. 🤔

@krasznaa
Copy link
Contributor Author

🤦 I need to drink some coffee... You're right, sycl::atan2 is there...

So we can at least hack around the issue for now. But being able to use std::atan2 and std::hypot in our code instead of sycl::atan2 and sycl::hypot would still make our lives a whole lot easier. (And we can already use those "standard" functions on an Intel backend. 😉)

@bader
Copy link
Contributor

bader commented Jan 18, 2022

C++ standard functions are not supported by SYCL standard. Instead SYCL provides a set of "built-in" functions in sycl:: namespace. See https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sycl:builtins. Using sycl:: math functions maybe less convenient, but the right way to write SYCL code.

There is an extension to support C++ standard functions, which is currently supported on Intel devices only.

So, I think the right classification for that issue is "enhancement" - enable C++ standard library extension on HIP/CUDA backends.

@krasznaa
Copy link
Contributor Author

Hi Alexey,

I have to say I was always confused by the trigonometric functions in the sycl:: namespace. 😕 Yes, if you have large arrays in your host code that you need to execute the same calculation on in bulk, then the "host version" of these functions could come in handy. (We never have such a "simple" problem ourselves though.) But in device code they just cause inconvenience. At least in my experience.

In our current development we are experimenting with declaring certain calculations in "accelerator agnostic" code in a way that they could be used either on the host or on a device. Being able to use "standard" trigonometric functions in such code is a big simplification. Since right now I'm working around this issue in our own code like:

// SYCL include(s).
#if defined(CL_SYCL_LANGUAGE_VERSION) || defined(SYCL_LANGUAGE_VERSION)
#include <CL/sycl.hpp>
#endif
...
    TRACCC_HOST_DEVICE
    scalar phi() const {
#if defined(CL_SYCL_LANGUAGE_VERSION) || defined(SYCL_LANGUAGE_VERSION)
        return cl::sycl::atan2(m_y, m_x);
#else
        return std::atan2(m_y, m_x);
#endif // SYCL
    }

But what's worse is that even after fixing all of these function calls in this particular project, it turns out that we are still pulling in at least one std::atan2 call from one of our "upstream" projects. So we'll need to teach that project about SYCL as well as it seems. 😦

Long story short, future SYCL standards should really officially support using the math functions from the C++ standard library in device code. Much like how CUDA supports the C <math.h> functions out of the box...

Cheers,
Attila

@AerialMantis AerialMantis added compiler Compiler related issue enhancement New feature or request and removed compiler Compiler related issue labels May 8, 2022
@hdelan
Copy link
Contributor

hdelan commented Jan 18, 2023

Hi @krasznaa this PR should solve the problem for CUDA #6482 . Note that there is an issue with this at the moment when -ffast-math is used (see here #7954), but this can be avoided if you provide the flag -fmath-errno whenever you use stdlib funcs and -ffast-math.

@krasznaa
Copy link
Contributor Author

Thanks for the update!

@hdelan
Copy link
Contributor

hdelan commented Jan 19, 2023

No prob. I am advocating for us to add support for CXX stdlib for HIP backend as well, so hopefully this is something that happens in the next few months or so. I will update this ticket if/when this happens

@JackAKirk
Copy link
Contributor

No prob. I am advocating for us to add support for CXX stdlib for HIP backend as well, so hopefully this is something that happens in the next few months or so. I will update this ticket if/when this happens

This is now implemented by #15055 Hopefully all of std::math now works on AMD too. I think that a lot of it is tested in the tests that were enabed by #15055

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda CUDA back-end enhancement New feature or request hip Issues related to execution on HIP backend.
Projects
None yet
Development

No branches or pull requests

7 participants