Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception raised when trying print float values on A770 #12880

Closed
DDEle opened this issue Mar 1, 2024 · 3 comments
Closed

Exception raised when trying print float values on A770 #12880

DDEle opened this issue Mar 1, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@DDEle
Copy link
Contributor

DDEle commented Mar 1, 2024

Describe the bug

I was trying to print some floating-point values from device kernel using sycl::ext::oneapi::experimental::printf. The following error will be raised then I trying to do this on a A770. (But it works fine on the Intel(R) UHD Graphics 630).

gta@DUT1395DG2FRD:~/test$ icpx -fsycl test_printf.cpp
gta@DUT1395DG2FRD:~/test$ ONEAPI_DEVICE_SELECTOR=level_zero1:0 ./a.out
Running on Intel(R) Arc(TM) A770 Graphics
terminate called after throwing an instance of 'sycl::_V1::exception'
  what():  Required aspect fp64 is not supported on the device
Aborted (core dumped)

A minimal cast of test_printf.cpp could be as below:

#include <iostream>
#include <CL/sycl.hpp>

constexpr int num = 16;
using namespace sycl;

int main() {
    auto queue = sycl::queue{};
    auto device = queue.get_info<info::queue::device>();
    std::cout << "Running on " << device.get_info<info::device::name>() << "\n";
    auto r = range {num};
    buffer<int> a {r};

    queue.submit([&](handler &h) {
        accessor out {a, h};
        h.parallel_for(r, [=](item<1> idx) {
            out[idx] = idx;
            sycl::ext::oneapi::experimental::printf(
                    "lid %f\n", (float)idx.get_linear_id());
        });
    });
}

To reproduce

Steps to reproduce are described in the previous section.

Environment

  • OS: Ubuntu 22.04 with 6.7.2-060702-generic (Also tried on 5.19 but the case also failed on that)

  • Target: Intel GPU

  • DPCPP version:

    $ icpx --version
    Intel(R) oneAPI DPC++/C++ Compiler 2024.1.0 (2024.1.0.20240117)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix
    InstalledDir: /opt/intel/oneapi/compiler/2024.1/bin/compiler
    Configuration file: /opt/intel/oneapi/compiler/2024.1/bin/compiler/../icpx.cfg
    
  • sycl-ls --verbose
    gta@DUT1395DG2FRD:~$ sycl-ls --verbose
    [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2024.17.1.0.17_160000]
    [opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz OpenCL 3.0 (Build 0) [2024.17.1.0.17_160000]
    [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO  [24.05.28454.6]
    [opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) UHD Graphics 630 OpenCL 3.0 NEO  [24.05.28454.6]
    [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.28454]
    [ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) UHD Graphics 630 1.3 [1.3.28454]
    
    Platforms: 5
    Platform [#1]:
        Version  : OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
        Name     : Intel(R) FPGA Emulation Platform for OpenCL(TM)
        Vendor   : Intel(R) Corporation
        Devices  : 1
            Device [#0]:
            Type       : acc
            Version    : OpenCL 1.2
            Name       : Intel(R) FPGA Emulation Device
            Vendor     : Intel(R) Corporation
            Driver     : 2024.17.1.0.17_160000
            Aspects    : accelerator fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_atomic_host_allocations usm_atomic_shared_allocations ext_oneapi_srgb ext_oneapi_non_uniform_groups
            info::device::sub_group_sizes: 4 8 16 32 64
    Platform [#2]:
        Version  : OpenCL 3.0 LINUX
        Name     : Intel(R) OpenCL
        Vendor   : Intel(R) Corporation
        Devices  : 1
            Device [#1]:
            Type       : cpu
            Version    : OpenCL 3.0 (Build 0)
            Name       : Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
            Vendor     : Intel(R) Corporation
            Driver     : 2024.17.1.0.17_160000
            Aspects    : cpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_oneapi_srgb ext_oneapi_native_assert ext_intel_legacy_image ext_oneapi_non_uniform_groups
            info::device::sub_group_sizes: 4 8 16 32 64
    Platform [#3]:
        Version  : OpenCL 3.0
        Name     : Intel(R) OpenCL Graphics
        Vendor   : Intel(R) Corporation
        Devices  : 1
            Device [#2]:
            Type       : gpu
            Version    : OpenCL 3.0 NEO
            Name       : Intel(R) Arc(TM) A770 Graphics
            Vendor     : Intel(R) Corporation
            Driver     : 24.05.28454.6
            Aspects    : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations atomic64 ext_oneapi_srgb ext_intel_device_id ext_intel_legacy_image ext_intel_esimd ext_oneapi_non_uniform_groups
            info::device::sub_group_sizes: 8 16 32
    Platform [#4]:
        Version  : OpenCL 3.0
        Name     : Intel(R) OpenCL Graphics
        Vendor   : Intel(R) Corporation
        Devices  : 1
            Device [#3]:
            Type       : gpu
            Version    : OpenCL 3.0 NEO
            Name       : Intel(R) UHD Graphics 630
            Vendor     : Intel(R) Corporation
            Driver     : 24.05.28454.6
            Aspects    : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations atomic64 ext_oneapi_srgb ext_intel_device_id ext_intel_legacy_image ext_intel_esimd ext_oneapi_non_uniform_groups
            info::device::sub_group_sizes: 8 16 32
    Platform [#5]:
        Version  : 1.3
        Name     : Intel(R) Level-Zero
        Vendor   : Intel(R) Corporation
        Devices  : 2
            Device [#0]:
            Type       : gpu
            Version    : 1.3
            Name       : Intel(R) Arc(TM) A770 Graphics
            Vendor     : Intel(R) Corporation
            Driver     : 1.3.28454
            Aspects    : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address ext_intel_gpu_eu_count ext_intel_gpu_eu_simd_width ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice atomic64 ext_intel_device_info_uuid ext_intel_gpu_hw_threads_per_eu ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_intel_esimd ext_oneapi_non_uniform_groups
            info::device::sub_group_sizes: 8 16 32
            Device [#1]:
            Type       : gpu
            Version    : 1.3
            Name       : Intel(R) UHD Graphics 630
            Vendor     : Intel(R) Corporation
            Driver     : 1.3.28454
            Aspects    : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address ext_intel_gpu_eu_count ext_intel_gpu_eu_simd_width ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice atomic64 ext_intel_device_info_uuid ext_intel_gpu_hw_threads_per_eu ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_intel_esimd ext_oneapi_non_uniform_groups
            info::device::sub_group_sizes: 8 16 32
    default_selector()      : gpu, Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.28454]
    accelerator_selector()  : acc, Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2024.17.1.0.17_160000]
    cpu_selector()          : cpu, Intel(R) OpenCL, Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz OpenCL 3.0 (Build 0) [2024.17.1.0.17_160000]
    gpu_selector()          : gpu, Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.28454]
    custom_selector(gpu)    : gpu, Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.28454]
    custom_selector(cpu)    : cpu, Intel(R) OpenCL, Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz OpenCL 3.0 (Build 0) [2024.17.1.0.17_160000]
    custom_selector(acc)    : acc, Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2024.17.1.0.17_160000]
    

Additional context

In the header file of sycl::ext::oneapi::experimental::printf, it says that it is implemented according to the printf of The OpenCL™ C Specification, where it says:

The conversion specifiers e,E,g,G,a,A convert a float or half argument that is a scalar type to a double only if the double data type is supported, e.g. for OpenCL C 3.0 or newer the __opencl_c_fp64 feature macro is present. If the double data type is not supported, the argument will be a float instead of a double and the half type will be converted to a float.

@DDEle DDEle added the bug Something isn't working label Mar 1, 2024
@AlexeySachkov
Copy link
Contributor

Hi @DDEle,

The conversion specifiers e,E,g,G,a,A convert a float or half argument that is a scalar type to a double only if the double data type is supported, e.g. for OpenCL C 3.0 or newer the __opencl_c_fp64 feature macro is present. If the double data type is not supported, the argument will be a float instead of a double and the half type will be converted to a float.

This is true for OpenCL C compiler, but not for SYCL, i.e. C++ compiler.

One of the functions we use under the hood is defined as variadic and passing all floating-points as double is part of an ABI for variadic functions. However, we have an internal macro to switch to non-variadic path and that should fix your problem:

#ifdef __SYCL_USE_NON_VARIADIC_SPIRV_OCL_PRINTF__
template <typename... Args>
extern __DPCPP_SYCL_EXTERNAL int
__spirv_ocl_printf(const __attribute__((opencl_constant)) char *Format,
Args... args);
template <typename... Args>
extern __DPCPP_SYCL_EXTERNAL int __spirv_ocl_printf(const char *Format,
Args... args);
#else
extern __DPCPP_SYCL_EXTERNAL int
__spirv_ocl_printf(const __attribute__((opencl_constant)) char *Format, ...);
extern __DPCPP_SYCL_EXTERNAL int __spirv_ocl_printf(const char *Format, ...);
#endif

I would suggest you to define __SYCL_USE_NON_VARIADIC_SPIRV_OCL_PRINTF__ macro before including sycl.hpp and it should help resolve the issue. I remember that we had plans to switch to this path by default, but don't remember exactly what was blocking us from that. Most likely some of our backend did not support this path. Unfortunately, I'm don't know what is the latest status and don't remember which backends were problematic, so at this point I'm just suggesting to try it out and see if it helps.

@DDEle
Copy link
Contributor Author

DDEle commented Mar 1, 2024

Thank you @AlexeySachkov for your timely detailed reply. I can confirm that your workaround of __SYCL_USE_NON_VARIADIC_SPIRV_OCL_PRINTF__ works fine for me.

@AlexeySachkov
Copy link
Contributor

We made updates to the macro to use non-variadic path by default in #13055. Therefore, closing this as fixed, no extra macro should be set anymore to make the example work.

Please let us know if you run into any issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants