Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu 18.04 + AMD GPU - core dump #243

Closed
ferlix9o opened this issue May 20, 2018 · 19 comments
Closed

Ubuntu 18.04 + AMD GPU - core dump #243

ferlix9o opened this issue May 20, 2018 · 19 comments

Comments

@ferlix9o
Copy link

ferlix9o commented May 20, 2018

  • OS Platform and Distribution*: "18.04 LTS (Bionic Beaver)"
  • TensorFlow installed from source :
  • TensorFlow version : (1.6.0rc0)
  • Python version : Python 2.7.15rc1
  • Bazel version : 0.11.1
  • GCC/Compiler version : 7.3.0
  • CUDA/cuDNN version : none
  • GPU model and memory : AMD Vega Frontier Edition

I succsefully build Tensorflow from this branch

using this command
bazel build -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package
this is the output for clinfo -

Platform Name AMD Accelerated Parallel Processing
Number of devices 1
Device Name gfx900
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 1.2 AMD-APP (2633.3)
Driver Version 2633.3 (PAL,HSAIL)
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Board Name (AMD) Radeon Vega Frontier Edition
Device Topology (AMD) PCI-E, 11:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes

and if i run
/usr/local/computecpp/bin/computecpp_info
i get


ComputeCpp Info (CE 0.8.0)


Toolchain information:
GLIBC version: 2.27
GLIBCXX: 20160609
This version of libstdc++ is supported.


Device Info:
Discovered 2 devices matching:
platform :
device type :

Device 0:
Device is supported : UNTESTED - Untested OS
CL_DEVICE_NAME : Intel(R) Xeon(R) CPU X5675 @ 3.07GHz
CL_DEVICE_VENDOR : Intel(R) Corporation
CL_DRIVER_VERSION : 1.2.0.10
CL_DEVICE_TYPE : CL_DEVICE_TYPE_CPU

Device 1:
Device is supported : UNTESTED - Untested OS
CL_DEVICE_NAME : gfx900
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2633.3 (PAL,HSAIL)
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU

anytime i run something for test pourpose i get the same error -

return np.fromstring(tensor.tensor_content, dtype=dtype).reshape(shape)
2018-05-20 11:16:42.755706: I ./tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
2018-05-20 11:16:42.913265: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:70] Found following OpenCL devices:
2018-05-20 11:16:42.913331: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 0, type: GPU, name: gfx900, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
terminate called after throwing an instance of 'cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)7, cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)6, cl::sycl::exception> >'

Aborted (core dumped)

i am not sure what i am doing wrong. can someone help me with that ?

also, how i get the source to compile with python3.6 ?

Thanks !

@lukeiwanski
Copy link
Owner

Hi @ferlix9o,
Thanks for the report.

This thread might be useful for you: codeplaysoftware/computecpp-sdk#116 (comment)

also, how i get the source to compile with python3.6 ?

When you run ./configure specify python3 when asked.
Please specify the location of python. [Default is /usr/bin/python]:

Hope that helps,
Luke

@ferlix9o
Copy link
Author

hei, thanks for the reply.

i red the other post but anything seems to work :-/

ya, python3 buld correctly now cheers for that !

i will investigate a bit more and i ll keep u updated.

@lukeiwanski
Copy link
Owner

I see.. so just to clarify. Are the ComputeCpp SDK examples working for you?
As well, could you gist the clinfo output?
@DuncanMcBain / @Rbiessy have you seen anything like that in the past?

@DuncanMcBain
Copy link
Collaborator

Only when I try to use double on my (ancient!) hardware, but trying the samples should give us more information. TensorFlow disables exceptions so there's no way for us to catch them when disappointments like this happen...

@ferlix9o
Copy link
Author

ferlix9o commented May 24, 2018

Hei !

So, because i wanted to test if i was doing everything correctly, i switched back to ubuntu 16.04 ( kernel 4.13.0-43-generic ), and i get the same results... i guess i must do something wrong :-/

i am using
driver AMD radeon 18.10

./amdgpu-pro-install --opencl=legacy,pal --headless
sudo apt-get install opencl-headers ocl-icd-opencl-dev

dpkg --get-selections |grep opencl

libopencl1-amdgpu-pro:amd64                     install
ocl-icd-libopencl1:amd64                        install
ocl-icd-opencl-dev:amd64                        install
opencl-amdgpu-pro-icd                           install
opencl-headers                                  install
opencl-orca-amdgpu-pro-icd:amd64                install

i am using the clinfo provided by amd ( cd /opt/amdgpu-pro/bin && ./clinfo ), i pasted the output at the end of this post.

I can compile the examples for computecpp but when i run them just few works

13% tests passed, 20 tests failed out of 23

the one that are not working show me something like this ( ./accessor )

>

./accessors
*** Error in `./accessors': free(): invalid pointer: 0x0000000001478e40 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f58549877e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f585499037a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f585499453c]
/opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl64.so(+0x11ba057)[0x7f5842a26057]
/opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl64.so(+0x11ba20a)[0x7f5842a2620a]
/opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl64.so(+0x11be00c)[0x7f5842a2a00c]
/opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl64.so(+0x113d023)[0x7f58429a9023]
/opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl64.so(aclBinaryFini+0x86)[0x7f58429a71c6]
/opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl64.so(+0xae1141)[0x7f584234d141]
/opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl64.so(+0xae1259)[0x7f584234d259]
/opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl64.so(+0x8fc1cf)[0x7f58421681cf]
/opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl64.so(+0x8fc339)[0x7f5842168339]
/opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl64.so(+0x8fbbab)[0x7f5842167bab]
/opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl64.so(clReleaseProgram+0x24)[0x7f5842157ec4]
/usr/local/computecpp/lib/libComputeCpp.so(_ZN2cl4sycl6detail7programD1Ev+0x1c1)[0x7f58558f1191]
./accessors(_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv+0x42)[0x413650]
/usr/local/computecpp/lib/libComputeCpp.so(_ZN2cl4sycl6detail7context25create_program_for_binaryERKSt10shared_ptrIS2_EPKhib+0x2af)[0x7f58558e3f4f]
/usr/local/computecpp/lib/libComputeCpp.so(_ZN2cl4sycl7program30create_program_for_kernel_implENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPKhiPKPKcSt10shared_ptrINS0_6detail7contextEEb+0x9f)[0
x7f585590a73f]
./accessors(_ZN2cl4sycl7program25create_program_for_kernelI8multiplyEES1_NS0_7contextE+0x301)[0x41462f]
./accessors[0x411cb5]
./accessors[0x411b3f]
./accessors[0x411834]
./accessors[0x411f1e]
./accessors[0x411bde]
./accessors(main+0xb9)[0x41194e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f5854930830]
./accessors(_start+0x29)[0x411469]
======= Memory map: ========
00400000-0041b000 r-xp 00000000 08:01 2360652 /home/frankie/Downloads/computecpp-sdk/build/samples/accessors/accessors
0061a000-0061b000 r--p 0001a000 08:01 2360652 /home/frankie/Downloads/computecpp-sdk/build/samples/accessors/accessors
0061b000-0061c000 rw-p 0001b000 08:01 2360652 /home/frankie/Downloads/computecpp-sdk/build/samples/accessors/accessors
01461000-017bc000 rw-p 00000000 00:00 0 [heap]
7f5834000000-7f5834021000 rw-p 00000000 00:00 0
7f5834021000-7f5838000000 ---p 00000000 00:00 0
7f583c000000-7f583c4b2000 rw-p 00000000 00:00 0
7f583c4b2000-7f5840000000 ---p 00000000 00:00 0
7f584186c000-7f58457e8000 r-xp 00000000 08:01 2490633 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl64.so
7f58457e8000-7f58459e8000 ---p 03f7c000 08:01 2490633 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl64.so
7f58459e8000-7f5845fe4000 rw-p 03f7c000 08:01 2490633 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl64.so
7f5845fe4000-7f5846457000 rw-p 00000000 00:00 0
7f5846457000-7f584829a000 r-xp 00000000 08:01 2490637 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl12cl64.so
7f584829a000-7f584849a000 ---p 01e43000 08:01 2490637 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl12cl64.so
7f584849a000-7f5848799000 rw-p 01e43000 08:01 2490637 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl12cl64.so
7f5848799000-7f584884e000 rw-p 00000000 00:00 0
7f584884e000-7f5848862000 r-xp 00000000 08:01 2490470 /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2.4.0
7f5848862000-7f5848a61000 ---p 00014000 08:01 2490470 /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2.4.0
7f5848a61000-7f5848a62000 r--p 00013000 08:01 2490470 /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2.4.0
7f5848a62000-7f5848a63000 rw-p 00014000 08:01 2490470 /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2.4.0
7f584c9ec000-7f584ca00000 rw-s 10813b000 00:06 221 /dev/dri/renderD128
7f584ca00000-7f584ce00000 rw-p 00000000 00:00 0
7f584ce0d000-7f584ce1d000 rw-s 10812b000 00:06 221 /dev/dri/renderD128
7f584ce1d000-7f584ce5d000 rw-s 107ecb000 00:06 221 /dev/dri/renderD128
7f584ce5d000-7f584ce6d000 rw-s 107ebb000 00:06 221 /dev/dri/renderD128
7f584ce6d000-7f584ce81000 rw-s 107e67000 00:06 221 /dev/dri/renderD128
7f584ce81000-7f584ce91000 rw-s 107e57000 00:06 221 /dev/dri/renderD128
7f584ce91000-7f584cea5000 rw-s 107e03000 00:06 221 /dev/dri/renderD128
7f584cea5000-7f584ceb5000 rw-s 107df3000 00:06 221 /dev/dri/renderD128
7f584ceb5000-7f584cec9000 rw-s 107d9f000 00:06 221 /dev/dri/renderD128
7f584cec9000-7f584ced9000 rw-s 107d8f000 00:06 221 /dev/dri/renderD128
7f584ced9000-7f584cf19000 rw-s 1162c0000 00:06 221 /dev/dri/renderD128
7f584cf19000-7f584cf2d000 rw-s 1162ac000 00:06 221 /dev/dri/renderD128
7f584cf2d000-7f584cf3d000 rw-s 11629c000 00:06 221 /dev/dri/renderD128
7f584cf3d000-7f584cf51000 rw-s 116248000 00:06 221 /dev/dri/renderD128
7f584cf51000-7f584cf61000 rw-s 116238000 00:06 221 /dev/dri/renderD128
7f584cf61000-7f584cf75000 rw-s 1161e4000 00:06 221 /dev/dri/renderD128
7f584cf75000-7f584cf85000 rw-s 1161d4000 00:06 221 /dev/dri/renderD128
7f584cf85000-7f584cf99000 rw-s 116180000 00:06 221 /dev/dri/renderD128
7f584cf99000-7f584cfa9000 rw-s 116170000 00:06 221 /dev/dri/renderD128
7f584cfa9000-7f584cfbd000 rw-s 10fd14000 00:06 221 /dev/dri/renderD128
7f584cfbd000-7f584cfbe000 rw-p 00000000 00:00 0
7f584cfbe000-7f584cfcd000 ---p 00000000 00:00 0
7f584cfcd000-7f584cfce000 rw-p 00000000 00:00 0
7f584cfce000-7f584cfdd000 ---p 00000000 00:00 0
7f584cfdd000-7f584cff1000 rw-s 10fd00000 00:06 221 /dev/dri/renderD128
7f584cff1000-7f584cfff000 r-xp 00000000 08:01 2490476 /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1.0.0
7f584cfff000-7f584d1fe000 ---p 0000e000 08:01 2490476 /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1.0.0
7f584d1fe000-7f584d1ff000 r--p 0000d000 08:01 2490476 /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1.0.0
7f584d1ff000-7f584d200000 rw-p 0000e000 08:01 2490476 /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1.0.0
7f584d200000-7f584d400000 rw-p 00000000 00:00 0
7f584d40a000-7f584d40b000 rw-p 00000000 00:00 0
7f584d40b000-7f584d41a000 ---p 00000000 00:00 0
7f584d41a000-7f584d41b000 rw-p 00000000 00:00 0
7f584d41b000-7f584d42a000 ---p 00000000 00:00 0
7f584d42a000-7f584d43e000 rw-s 10fcec000 00:06 221 /dev/dri/renderD128
7f584d43e000-7f584d44e000 rw-s 10fcdc000 00:06 221 /dev/dri/renderD128
7f584d44e000-7f584d48e000 rw-s 10fc7c000 00:06 221 /dev/dri/renderD128
7f584d48e000-7f584d49e000 rw-s 10fc6c000 00:06 221 /dev/dri/renderD128
7f584d49e000-7f584d4b2000 rw-s 10fc18000 00:06 221 /dev/dri/renderD128
7f584d4b2000-7f584d4c2000 rw-s 10456e000 00:06 221 /dev/dri/renderD128
7f584d4c2000-7f584d4d6000 rw-s 10fbc4000 00:06 221 /dev/dri/renderD128
7f584d4d6000-7f584d4e6000 rw-s 106dc5000 00:06 221 /dev/dri/renderD128
7f584d4e6000-7f584d4fa000 rw-s 10fb70000 00:06 221 /dev/dri/renderD128
7f584d4fa000-7f584d50a000 rw-s 105bbc000 00:06 221 /dev/dri/renderD128
7f584d50a000-7f584d54a000 rw-s 10faf0000 00:06 221 /dev/dri/renderD128
7f584d54a000-7f584d55e000 rw-s 105f20000 00:06 221 /dev/dri/renderD128
7f584d55e000-7f584d56e000 rw-s 104b0e000 00:06 221 /dev/dri/renderD128
7f584d56e000-7f584d582000 rw-s 105f0c000 00:06 221 /dev/dri/renderD128
7f584d582000-7f584d592000 rw-s 105e4c000 00:06 221 /dev/dri/renderD128
7f584d592000-7f584d5a6000 rw-s 105d20000 00:06 221 /dev/dri/renderD128
7f584d5a6000-7f584d5b6000 rw-s 108340000 00:06 221 /dev/dri/renderD128
7f584d5b6000-7f584d5ca000 rw-s 105d0c000 00:06 221 /dev/dri/renderD128
7f584d5ca000-7f584d5cb000 ---p 00000000 00:00 0
7f584d5cb000-7f584ddcb000 rw-p 00000000 00:00 0
7f584ddcb000-7f584ddcc000 ---p 00000000 00:00 0
7f584ddcc000-7f584e5cc000 rw-p 00000000 00:00 0
7f584e5cc000-7f584e5cd000 ---p 00000000 00:00 0
7f584e5cd000-7f584edcd000 rw-p 00000000 00:00 0
7f584edcd000-7f584edce000 ---p 00000000 00:00 0
7f584edce000-7f584f5ce000 rw-p 00000000 00:00 0
7f584f5ce000-7f584f5cf000 ---p 00000000 00:00 0
7f584f5cf000-7f584fdcf000 rw-p 00000000 00:00 0
7f584fdcf000-7f584fdd0000 ---p 00000000 00:00 0
7f584fdd0000-7f58505d0000 rw-p 00000000 00:00 0
7f58505d0000-7f58505d1000 ---p 00000000 00:00 0
7f58505d1000-7f5850dd1000 rw-p 00000000 00:00 0
7f5850dd1000-7f5850dd2000 ---p 00000000 00:00 0
7f5850dd2000-7f58515d2000 rw-p 00000000 00:00 0
7f58515d2000-7f58515d3000 ---p 00000000 00:00 0
7f58515d3000-7f5851dd3000 rw-p 00000000 00:00 0
7f5851dd3000-7f5851dd4000 ---p 00000000 00:00 0
7f5851dd4000-7f58525d4000 rw-p 00000000 00:00 0
7f58525d4000-7f58525d5000 ---p 00000000 00:00 0
7f58525d5000-7f5852dd5000 rw-p 00000000 00:00 0
7f5852dd5000-7f5852dd6000 ---p 00000000 00:00 0
7f5852dd6000-7f58535d6000 rw-p 00000000 00:00 0
7f58535d6000-7f58535d7000 ---p 00000000 00:00 0
7f58535d7000-7f5853dd7000 rw-p 00000000 00:00 0
7f5853dd7000-7f5853dda000 r-xp 00000000 08:01 1578099 /lib/x86_64-linux-gnu/libdl-2.23.so
7f5853dda000-7f5853fd9000 ---p 00003000 08:01 1578099 /lib/x86_64-linux-gnu/libdl-2.23.so
7f5853fd9000-7f5853fda000 r--p 00002000 08:01 1578099 /lib/x86_64-linux-gnu/libdl-2.23.so
7f5853fda000-7f5853fdb000 rw-p 00003000 08:01 1578099 /lib/x86_64-linux-gnu/libdl-2.23.so
7f5853fdb000-7f5853fe2000 r-xp 00000000 08:01 1578229 /lib/x86_64-linux-gnu/librt-2.23.so
7f5853fe2000-7f58541e1000 ---p 00007000 08:01 1578229 /lib/x86_64-linux-gnu/librt-2.23.so
7f58541e1000-7f58541e2000 r--p 00006000 08:01 1578229 /lib/x86_64-linux-gnu/librt-2.23.so
7f58541e2000-7f58541e3000 rw-p 00007000 08:01 1578229 /lib/x86_64-linux-gnu/librt-2.23.so
7f58541e3000-7f58542eb000 r-xp 00000000 08:01 1578145 /lib/x86_64-linux-gnu/libm-2.23.so
7f58542eb000-7f58544ea000 ---p 00108000 08:01 1578145 /lib/x86_64-linux-gnu/libm-2.23.so
7f58544ea000-7f58544eb000 r--p 00107000 08:01 1578145 /lib/x86_64-linux-gnu/libm-2.23.so
7f58544eb000-7f58544ec000 rw-p 00108000 08:01 1578145 /lib/x86_64-linux-gnu/libm-2.23.so
7f58544ec000-7f5854504000 r-xp 00000000 08:01 1578221 /lib/x86_64-linux-gnu/libpthread-2.23.so
7f5854504000-7f5854703000 ---p 00018000 08:01 1578221 /lib/x86_64-linux-gnu/libpthread-2.23.so
7f5854703000-7f5854704000 r--p 00017000 08:01 1578221 /lib/x86_64-linux-gnu/libpthread-2.23.so
7f5854704000-7f5854705000 rw-p 00018000 08:01 1578221 /lib/x86_64-linux-gnu/libpthread-2.23.so
7f5854705000-7f5854709000 rw-p 00000000 00:00 0
7f5854709000-7f5854710000 r-xp 00000000 08:01 2490629 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libOpenCL.so.1
7f5854710000-7f585490f000 ---p 00007000 08:01 2490629 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libOpenCL.so.1
7f585490f000-7f5854910000 rw-p 00006000 08:01 2490629 /opt/amdgpu-pro/lib/x86_64-linux-gnu/libOpenCL.so.1
7f5854910000-7f5854ad0000 r-xp 00000000 08:01 1578075 /lib/x86_64-linux-gnu/libc-2.23.so
7f5854ad0000-7f5854cd0000 ---p 001c0000 08:01 1578075 /lib/x86_64-linux-gnu/libc-2.23.so
7f5854cd0000-7f5854cd4000 r--p 001c0000 08:01 1578075 /lib/x86_64-linux-gnu/libc-2.23.so
7f5854cd4000-7f5854cd6000 rw-p 001c4000 08:01 1578075 /lib/x86_64-linux-gnu/libc-2.23.so
7f5854cd6000-7f5854cda000 rw-p 00000000 00:00 0
7f5854cda000-7f5854cf0000 r-xp 00000000 08:01 1578113 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f5854cf0000-7f5854eef000 ---p 00016000 08:01 1578113 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f5854eef000-7f5854ef0000 rw-p 00015000 08:01 1578113 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f5854ef0000-7f5855062000 r-xp 00000000 08:01 3549284 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f5855062000-7f5855262000 ---p 00172000 08:01 3549284 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f5855262000-7f585526c000 r--p 00172000 08:01 3549284 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f585526c000-7f585526e000 rw-p 0017c000 08:01 3549284 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f585526e000-7f5855272000 rw-p 00000000 00:00 0
7f5855272000-7f58559d7000 r-xp 00000000 08:01 3554693 /usr/local/computecpp/lib/libComputeCpp.so
7f58559d7000-7f5855bd6000 ---p 00765000 08:01 3554693 /usr/local/computecpp/lib/libComputeCpp.so
7f5855bd6000-7f5855bda000 r--p 00764000 08:01 3554693 /usr/local/computecpp/lib/libComputeCpp.so
7f5855bda000-7f5855bdf000 rw-p 00768000 08:01 3554693 /usr/local/computecpp/lib/libComputeCpp.so
7f5855bdf000-7f5855c05000 r-xp 00000000 08:01 1578047 /lib/x86_64-linux-gnu/ld-2.23.so
7f5855c0a000-7f5855c1a000 rw-s 1065bc000 00:06 221 /dev/dri/renderD128
7f5855c1a000-7f5855c2e000 rw-s 105f75000 00:06 221 /dev/dri/renderD128
7f5855c2e000-7f5855c2f000 rw-p 00000000 00:00 0
7f5855c2f000-7f5855c3e000 ---p 00000000 00:00 0
7f5855c3e000-7f5855c3f000 rw-p 00000000 00:00 0
7f5855c3f000-7f5855c4e000 ---p 00000000 00:00 0
7f5855c4e000-7f5855c62000 rw-s 105f61000 00:06 221 /dev/dri/renderD128
7f5855c62000-7f5855c63000 rw-p 00000000 00:00 0
7f5855c63000-7f5855c72000 ---p 00000000 00:00 0
7f5855c72000-7f5855c92000 rw-s 105ebc000 00:06 221 /dev/dri/renderD128
7f5855c92000-7f5855c93000 rw-p 00000000 00:00 0
7f5855c93000-7f5855ca2000 ---p 00000000 00:00 0
7f5855ca2000-7f5855d22000 rw-s 116c70000 00:06 221 /dev/dri/renderD128
7f5855d22000-7f5855da2000 rw-s 116bf0000 00:06 221 /dev/dri/renderD128
7f5855da2000-7f5855da3000 ---p 00000000 00:00 0
7f5855da3000-7f5855dea000 rw-p 00000000 00:00 0
7f5855df2000-7f5855df4000 rw-p 00000000 00:00 0
7f5855df4000-7f5855e03000 ---p 00000000 00:00 0
7f5855e03000-7f5855e04000 rw-p 00000000 00:00 0
7f5855e04000-7f5855e05000 r--p 00025000 08:01 1578047 /lib/x86_64-linux-gnu/ld-2.23.so
7f5855e05000-7f5855e06000 rw-p 0002600 08:01 1578047 /lib/x86_64-linux-gnu/ld-2.23.so
7f5855e06000-7f5855e07000 rw-p 00000000 00:00 0
7ffca3319000-7ffca333a000 rw-p 00000000 00:00 0 [stack]
7ffca335b000-7ffca335e000 r--p 00000000 00:00 0 [vvar]
7ffca335e000-7ffca3360000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted (core dumped)

./clinfo

Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (2580.4)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: Radeon Vega Frontier Edition
Device Topology: PCI[ B#17, D#0, F#0 ]
Max compute units: 64
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1600Mhz
Address bits: 64
Max memory allocation: 4244635648
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 16978542592
Constant buffer size: 4244635648
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Max pipe arguments: 0
Max pipe active reservations: 0
Max pipe packet size: 0
Max global variable size: 0
Max global variable preferred total size: 0
Max read/write image args: 0
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x7faa12fa1350
Name: gfx900
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 2580.4 (PAL,HSAIL)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2580.4)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_pri
ntf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

@Rbiessy
Copy link
Collaborator

Rbiessy commented May 24, 2018

I remember having issues with the recent amdgpu-pro driver. I am still using 17.50 so that could be the issue. It's a good idea to try the computecpp-sdk samples before compiling TensorFlow.

@ferlix9o
Copy link
Author

ferlix9o commented May 24, 2018

ya, i m doing exately that, i will try to go back to amdgpu 17.50 thanks for the tip !

i will keep u posted !

@DuncanMcBain
Copy link
Collaborator

Yeah so there it's failing when ComputeCpp is trying to compile the kernel, there's not a lot we can do if it crashes though :c when it doesn't crash we might be able to retrieve the build logs.

@ferlix9o
Copy link
Author

I just checked my Hardware specs, i own a Workstation hp z800.

It does not support PCI 3.0 express, only 2.0, might be the issue ?
That could explain why opencl can't comunicate properly with the GPU and the core dump ?

@AustinMooreT
Copy link

I also get the same exception as @ferlix9o.
Here is my clinfo; if there is any other pertinent information I can provide please let me know.

Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP (2633.3)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Host timer resolution 1ns
Platform Extensions function suffix AMD

Platform Name AMD Accelerated Parallel Processing
Number of devices 1
Device Name gfx900
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 1.2 AMD-APP (2633.3)
Driver Version 2633.3 (PAL,HSAIL)
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Board Name (AMD) Radeon RX Vega
Device Topology (AMD) PCI-E, 25:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 56
SIMD per compute unit (AMD) 4
SIMD width (AMD) 16
SIMD instruction width (AMD) 1
Max clock frequency 1590MHz
Graphics IP (AMD) 9.0
Device Partition (core)
Max number of sub-devices 56
Supported partition types (n/a)
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 256
Preferred work group size (AMD) 256
Max work group size (AMD) 1024
Preferred work group size multiple 64
Wavefront width (AMD) 64
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs No
Round to nearest No
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 8573157376 (7.984GiB)
Global free memory (AMD) 8372224 (7.984GiB)
Global memory channels (AMD) 64
Global memory banks per channel (AMD) 4
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 4244635648 (3.953GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Global Memory cache type Read/Write
Global Memory cache size 16384 (16KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 pixels
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 32768 (32KiB)
Local memory syze per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max number of constant args 8
Max constant buffer size 4244635648 (3.953GiB)
Preferred constant buffer size (AMD) 16384 (16KiB)
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1527536416378951491ns (Mon May 28 15:40:16 2018)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) Yes
Number of async queues (AMD) 8
Max real-time compute queues (AMD) 0
Max real-time compute units (AMD) 0
SPIR versions 1.2
printf() buffer size 4194304 (4MiB)
Built-in kernels (n/a)
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) AMD Accelerated Parallel Processing
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [AMD]
clCreateContext(NULL, ...) [default] Success [AMD]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx900
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx900
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx900

ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.12
ICD loader Profile OpenCL 2.2

@Rbiessy
Copy link
Collaborator

Rbiessy commented May 31, 2018

I don't think the PCI version would be an issue.
@AustinMooreT @ferlix9o have any of you had the chance to try the amdgpu-pro 17.50 driver? In the clinfo this gives me a driver version of 2482.3. I don't see what else could be wrong.

@AustinMooreT
Copy link

@Rbiessy I have not; I'll install it tonight and report back with the results.

@mirh
Copy link

mirh commented Jun 2, 2018

It does not support PCI 3.0 express, only 2.0, might be the issue ?

PCI-atomics (3.0) were required for ROCm - at least until very recently

Also please take notice that "opencl=legacy" is not going to work at all on Vega.
It's only pal, or rocm.

2018-05-20 11:16:42.755706: I ./tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2

This if any seems pretty darn WTF

@ferlix9o
Copy link
Author

ferlix9o commented Jul 3, 2018

i did test my GPU ( vega FE 64 liquid ) on another system ( ryzen 7 cpu with PCI 3.0, with ubuntu 18.04 )

Same situation, doesn't work with ComputeCpp, the examples keeps returning errors... i am not sure 100% but probabely PCI 2.0 or 3.0 won't make any differenece.

Still looking for a solution ! i will keep trying and post here. thx

@DuncanMcBain
Copy link
Collaborator

If you try running the tests in the SDK in verbose mode, you might be able to get some good debug output from them. (You can do this with ctest -v, I believe). I'm kind of expecting to see output indicating that clBuildProgram failed, though - and as mentioned, in that case, there's not a lot we can do. It might be interesting to confirm it, however.

@ferlix9o
Copy link
Author

ferlix9o commented Jul 7, 2018

 ctest -v

      Start  1: async-handler
 1/24 Test  #1: async-handler ....................***Exception: Other  0.20 sec
      Start  2: simple-vector-add
 2/24 Test  #2: simple-vector-add ................***Exception: Other  0.18 sec
      Start  3: reinterpret
 3/24 Test  #3: reinterpret ......................***Exception: Other  0.19 sec
      Start  4: parallel-for
 4/24 Test  #4: parallel-for .....................***Exception: Other  0.19 sec
      Start  5: custom-device-selector
 5/24 Test  #5: custom-device-selector ...........***Exception: Other  0.18 sec
      Start  6: scan
 6/24 Test  #6: scan .............................***Exception: Other  0.19 sec
      Start  7: reduction
 7/24 Test  #7: reduction ........................***Exception: Other  0.18 sec
      Start  8: hello-world
 8/24 Test  #8: hello-world ......................***Exception: Other  0.20 sec
      Start  9: using-function-objects
 9/24 Test  #9: using-function-objects ...........***Exception: Other  0.20 sec
      Start 10: opencl-c-interop
10/24 Test #10: opencl-c-interop .................***Exception: Other  0.22 sec
      Start 11: accessors
11/24 Test #11: accessors ........................***Exception: Other  0.20 sec
      Start 12: example-sycl-application
12/24 Test #12: example-sycl-application .........***Exception: Other  0.19 sec
      Start 13: matrix-multiply_omp
13/24 Test #13: matrix-multiply_omp ..............   Passed    0.01 sec
      Start 14: matrix-multiply_sycl
14/24 Test #14: matrix-multiply_sycl .............***Exception: Other  0.19 sec
      Start 15: gaussian-blur
15/24 Test #15: gaussian-blur ....................***Exception: Other  0.20 sec
      Start 16: images
16/24 Test #16: images ...........................***Exception: Other  0.20 sec
      Start 17: simple-example-of-vectors
17/24 Test #17: simple-example-of-vectors ........***Exception: Other  0.18 sec
      Start 18: smart-pointer
18/24 Test #18: smart-pointer ....................***Exception: Other  0.18 sec
      Start 19: simple-local-barrier
19/24 Test #19: simple-local-barrier .............***Exception: Other  0.18 sec
      Start 20: template-function-object
20/24 Test #20: template-function-object .........***Exception: Other  0.18 sec
      Start 21: tiled-convolution
21/24 Test #21: tiled-convolution ................***Exception: Other  0.19 sec
      Start 22: vptr
22/24 Test #22: vptr .............................***Exception: Other  0.17 sec
      Start 23: ivka
23/24 Test #23: ivka .............................   Passed    0.00 sec
      Start 24: simple-private-memory
24/24 Test #24: simple-private-memory ............***Exception: Other  0.18 sec

8% tests passed, 22 tests failed out of 24

Total Test time (real) =   4.19 sec

The following tests FAILED:
	  1 - async-handler (OTHER_FAULT)
	  2 - simple-vector-add (OTHER_FAULT)
	  3 - reinterpret (OTHER_FAULT)
	  4 - parallel-for (OTHER_FAULT)
	  5 - custom-device-selector (OTHER_FAULT)
	  6 - scan (OTHER_FAULT)
	  7 - reduction (OTHER_FAULT)
	  8 - hello-world (OTHER_FAULT)
	  9 - using-function-objects (OTHER_FAULT)
	 10 - opencl-c-interop (OTHER_FAULT)
	 11 - accessors (OTHER_FAULT)
	 12 - example-sycl-application (OTHER_FAULT)
	 14 - matrix-multiply_sycl (OTHER_FAULT)
	 15 - gaussian-blur (OTHER_FAULT)
	 16 - images (OTHER_FAULT)
	 17 - simple-example-of-vectors (OTHER_FAULT)
	 18 - smart-pointer (OTHER_FAULT)
	 19 - simple-local-barrier (OTHER_FAULT)
	 20 - template-function-object (OTHER_FAULT)
	 21 - tiled-convolution (OTHER_FAULT)
	 22 - vptr (OTHER_FAULT)
	 24 - simple-private-memory (OTHER_FAULT)
Errors while running CTest

they fail with that error :

./simple-vector-add 
terminate called after throwing an instance of 'cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)7, cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)6, cl::sycl::exception> >'
Aborted (core dumped)

I tried again with rocm but apparently is missing the SPIR support
I am currently using the driver from AMD with the flag

./amdgpu-pro-install --opencl=pal

and the output for

/usr/local/computecpp/bin/computecpp_info

********************************************************************************
ComputeCpp Info (CE ..)
********************************************************************************
Toolchain information:
GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.
********************************************************************************
Device Info:
Discovered 1 devices matching:
  platform    : <any>
  device type : <any>
--------------------------------------------------------------------------------
Device 0:
  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : gfx900
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2639.3 (PAL,HSAIL)
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 

********************************************************************************

Vega Frontier is a great device but is damn hard to make it work :-/

@mirh
Copy link

mirh commented Jul 7, 2018

Try to run one of the faulty tests in gdb, then report the backtrace?

EDIT: wait, isn't this the same of #167 (comment)?

@DuncanMcBain
Copy link
Collaborator

That particular exception means compilation failure for the device, and indeed, that's what I expected to happen. There's not a lot more to suggest - the device and driver combination won't work :c

@ferlix9o
Copy link
Author

good to see with rocm 1.8 now everything is working properly. all the installation process is way easier ^_^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants