-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python tensorflow in nvidia-enabled tumbleweed and fedora distroboxes unable to talk to GPU #230
Comments
Can you make sure that nvidia-smi can run inside the distrobox? Next, make sure that the nvidia containers toolkit file is on disk. We have a systemd oneshot that makes sure it is written that starts with ublue-. |
It can, in both. I double checked that.
The systemd oneshot is there --- I ran |
No, rerunning the nvidia toolkit systemd service didn't change anything. Nvidia-smi runs, but tensorflow says it's missing libraries. |
@m2Giles using the official tensorflow container doesn't work either
|
I've deleted and recreated the distrobox several times now and it still doesn't work. |
It looks like the problem might be because the libcudnn and libcudart libraries aren't installed? I don't remember ever needing them in order for tensorflow to work though. NVCC is also missing. |
Still no luck whatsoever with this. Going to temporarily rebased from my derivative image to silverblue-nvidia to see if it's that. |
Update! When I tested the official tensorflow docker image, I was actually using the wrong image! With
So now I just need to investigate why installing tensorflow on tumbleweed, ubuntu, or fedora (39 or rawhide) manually doesn't work, but the official docker image does! And in the meantime I can just run things in the official docker image, and that's... acceptable. |
I recreated my fedora container again and am also getting a slightly different and more enlightening error from tf this time:
although:
|
It looks like the problem might be that the latest version of tensorflow (2.16.1) doesn't actually support the latest CUDA version (12.4), since the people here and here seem to be having a similar problem and the table here indicates CUDA 12.4 isn't officially supported, and the official container and my old tumbleweed container had CUDA 12.3 while the new ones have 12.4. |
@alexispurslane from the table you displayed it is clear that TensorFlow version 2.16.1 is compatible with CUDA 12.3 (and not compatible with version 12.4). However, it turns out that when you
Thus, it seems practically impossible for someone owning a PC with CUDA-enabled GPU to perform deep learning experiments with TensorFlow version 2.16.1 and utilize his GPU locally without manually performing some extra steps not included (until today) in the official TensorFlow documentation of the standard installation procedure of TensorFlow for Linux users with GPUs at least as a temporal fix! That's why I submitted a pull request in good faith and for the shake of all users as TensorFlow is "An Open Source Machine Learning Framework for Everyone". |
Symptoms
Whether I create an ephemeral fedora rawhide or 39 distrobox with
--nvidia
, or use the tumbleweed distrobox I created from a distrobox-assemble withnvidia=true
, and whether I create a python venv and thenpip install tensorflow[and-cuda]
or just dopip install --break-system-packages tensorflow[and-cuda]
publicly, when installing those packages afresh, I get this output when trying to use tensorflow with my gpu:Steps to reproduce
import tensorflow as tf; tf.config.list_logical_devices()
The text was updated successfully, but these errors were encountered: