-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ubuntu 18.04 + AMD GPU - core dump #243
Comments
Hi @ferlix9o, This thread might be useful for you: codeplaysoftware/computecpp-sdk#116 (comment)
When you run Hope that helps, |
hei, thanks for the reply. i red the other post but anything seems to work :-/ ya, python3 buld correctly now cheers for that ! i will investigate a bit more and i ll keep u updated. |
I see.. so just to clarify. Are the ComputeCpp SDK examples working for you? |
Only when I try to use double on my (ancient!) hardware, but trying the samples should give us more information. TensorFlow disables exceptions so there's no way for us to catch them when disappointments like this happen... |
Hei ! So, because i wanted to test if i was doing everything correctly, i switched back to ubuntu 16.04 ( kernel 4.13.0-43-generic ), and i get the same results... i guess i must do something wrong :-/ i am using
dpkg --get-selections |grep opencl
i am using the clinfo provided by amd ( cd /opt/amdgpu-pro/bin && ./clinfo ), i pasted the output at the end of this post. I can compile the examples for computecpp but when i run them just few works
the one that are not working show me something like this ( ./accessor ) >
./clinfo
|
I remember having issues with the recent amdgpu-pro driver. I am still using 17.50 so that could be the issue. It's a good idea to try the computecpp-sdk samples before compiling TensorFlow. |
ya, i m doing exately that, i will try to go back to amdgpu 17.50 thanks for the tip ! i will keep u posted ! |
Yeah so there it's failing when ComputeCpp is trying to compile the kernel, there's not a lot we can do if it crashes though :c when it doesn't crash we might be able to retrieve the build logs. |
I just checked my Hardware specs, i own a Workstation hp z800. It does not support PCI 3.0 express, only 2.0, might be the issue ? |
I also get the same exception as @ferlix9o.
|
I don't think the PCI version would be an issue. |
@Rbiessy I have not; I'll install it tonight and report back with the results. |
PCI-atomics (3.0) were required for ROCm - at least until very recently Also please take notice that "opencl=legacy" is not going to work at all on Vega.
This if any seems pretty darn WTF |
i did test my GPU ( vega FE 64 liquid ) on another system ( ryzen 7 cpu with PCI 3.0, with ubuntu 18.04 ) Same situation, doesn't work with ComputeCpp, the examples keeps returning errors... i am not sure 100% but probabely PCI 2.0 or 3.0 won't make any differenece. Still looking for a solution ! i will keep trying and post here. thx |
If you try running the tests in the SDK in verbose mode, you might be able to get some good debug output from them. (You can do this with |
they fail with that error :
I tried again with rocm but apparently is missing the SPIR support
and the output for
Vega Frontier is a great device but is damn hard to make it work :-/ |
EDIT: wait, isn't this the same of #167 (comment)? |
That particular exception means compilation failure for the device, and indeed, that's what I expected to happen. There's not a lot more to suggest - the device and driver combination won't work :c |
good to see with rocm 1.8 now everything is working properly. all the installation process is way easier ^_^ |
I succsefully build Tensorflow from this branch
using this command
bazel build -c opt --config=sycl //tensorflow/tools/pip_package:build_pip_package
this is the output for clinfo -
and if i run
/usr/local/computecpp/bin/computecpp_info
i get
anytime i run something for test pourpose i get the same error -
i am not sure what i am doing wrong. can someone help me with that ?
also, how i get the source to compile with python3.6 ?
Thanks !
The text was updated successfully, but these errors were encountered: