Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to find GPU #23

Open
frmnboi opened this issue Sep 11, 2023 · 2 comments
Open

Unable to find GPU #23

frmnboi opened this issue Sep 11, 2023 · 2 comments

Comments

@frmnboi
Copy link

frmnboi commented Sep 11, 2023

I was able to build and run the application offline on my own hardware. However, I was met with:

Torch reports CUDA not available
and I cannot train anything with FP16 precision, instead having to train using the cpu exclusively.

About my system:

I installed the nvidia toolkit using the instructions and configured it, restarting docker daemon:
[https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html]

Output for driver and CUDA from nvidia-smi:

** NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4**

System information:
Linux Mint 20 x86_64
5.15.0-83-generic

GPU information:
RTX 3070
registered as dev/nvidia0

Any ideas what might be causing this?

Edit:
After the cpu training was finished, I got another error that may shed light on what was causing this:

kohya-docker-kohya-1 | CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
kohya-docker-kohya-1 | CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
kohya-docker-kohya-1 | CUDA exception! Error code: forward compatibility was attempted on non supported HW
kohya-docker-kohya-1 | CUDA exception! Error code: initialization error
kohya-docker-kohya-1 | CUDA SETUP: Highest compute capability among GPUs detected: None
kohya-docker-kohya-1 | CUDA exception! Error code: forward compatibility was attempted on non supported HW
kohya-docker-kohya-1 | CUDA SETUP: CUDA version lower than 11 are currenlty not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
kohya-docker-kohya-1 | CUDA SETUP: Detected CUDA version 00
kohya-docker-kohya-1 | CUDA SETUP: TODO: compile library for specific version: libbitsandbytes_cuda00_nocublaslt.so
kohya-docker-kohya-1 | CUDA SETUP: Defaulting to libbitsandbytes.so...
kohya-docker-kohya-1 | CUDA SETUP: CUDA detection failed. Either CUDA driver not installed, CUDA not installed, or you have multiple conflicting CUDA libraries!
kohya-docker-kohya-1 | CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.

My current CUDA version:
Cuda compilation tools, release 10.1, V10.1.243

I'm guessing my CUDA version is too old and not configured properly since it isn't found in path. I'll try reinstalling a version 11 or higher and retrying the docker image.

@frmnboi
Copy link
Author

frmnboi commented Sep 14, 2023

Finished the update. Now using on my host machine:

CUDA 12.0 and Nvidida driver 535 (proprietary)

Now the server does not start and gives me the following messages:

kohya-docker-kohya-1 | /venv/lib/python3.10/site-packages/torch/cuda/init.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
kohya-docker-kohya-1 | return torch._C._cuda_getDeviceCount() > 0
kohya-docker-kohya-1 | LOG INIT
kohya-docker-kohya-1 | INFO: nVidia toolkit detected
kohya-docker-kohya-1 | INFO: Torch 2.0.1+cu118
kohya-docker-kohya-1 | WARNING: Torch reports CUDA not available

@martinobettucci
Copy link
Contributor

Looks like you have different drivers between the host and the docker.

You have to use the professional developers drivers in your host machine instead of the general purpose drivers shipped with your os

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants