Using the packages from [community] with an RX 580 results in a segfault trying to do nearly anything using pytorch. #961
-
Hello guys. I am having some trouble running rocm on arch on gfx803 and I'm not sure how to fix it. I've tried all sort of environment variables but nothing seems to have worked. rocminfo and rocm-smi work as expected and the test c++ code given in another discussion here works fine, but using pytorch causes rocm to crash. I'm also using the python-pytorch-rocm package from community which I thought was supposed to work with gfx803. Here is the output I get when running python in lldb:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi. I also have this problem. I can compile and run the hipcc test.cpp and it reports
As rederick29 reports, the segfault occurs during an operation in libamd64.so, part of the
I have tried doing a clean operating system install and compiling the |
Beta Was this translation helpful? Give feedback.
-
I have managed to find a fix for this issue. Thank you so much @mpeschel10 ! As recommended by mpeschel10, I ran TL;DR: |
Beta Was this translation helpful? Give feedback.
I have managed to find a fix for this issue. Thank you so much @mpeschel10 !
As recommended by mpeschel10, I ran
export HSA_OVERRIDE_GFX_VERSION=8.0.3
which caused me to crash again in the same way, but lldb showed that the underlying issue was different this time. After further troubleshooting, I happened to find out that after exporting that environment variable,clinfo
also crashed in the same way, which did not happen before. Thanks to theAMD_LOG_LEVEL=4
I was able to find out that my second GPU, agfx90c
, was attempting to load thegfx803
code due to theHSA_OVERRIDE_GFX_VERSION=8.0.3
variable (meaning that my error was now caused by the other device, which I never intended to use).…