Unable to find GPU #23

frmnboi · 2023-09-11T06:55:08Z

I was able to build and run the application offline on my own hardware. However, I was met with:

Torch reports CUDA not available
and I cannot train anything with FP16 precision, instead having to train using the cpu exclusively.

About my system:

I installed the nvidia toolkit using the instructions and configured it, restarting docker daemon:
[https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html]

Output for driver and CUDA from nvidia-smi:

** NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4**

System information:
Linux Mint 20 x86_64
5.15.0-83-generic

GPU information:
RTX 3070
registered as dev/nvidia0

Any ideas what might be causing this?

Edit:
After the cpu training was finished, I got another error that may shed light on what was causing this:

kohya-docker-kohya-1 | CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
kohya-docker-kohya-1 | CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
kohya-docker-kohya-1 | CUDA exception! Error code: forward compatibility was attempted on non supported HW
kohya-docker-kohya-1 | CUDA exception! Error code: initialization error
kohya-docker-kohya-1 | CUDA SETUP: Highest compute capability among GPUs detected: None
kohya-docker-kohya-1 | CUDA exception! Error code: forward compatibility was attempted on non supported HW
kohya-docker-kohya-1 | CUDA SETUP: CUDA version lower than 11 are currenlty not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
kohya-docker-kohya-1 | CUDA SETUP: Detected CUDA version 00
kohya-docker-kohya-1 | CUDA SETUP: TODO: compile library for specific version: libbitsandbytes_cuda00_nocublaslt.so
kohya-docker-kohya-1 | CUDA SETUP: Defaulting to libbitsandbytes.so...
kohya-docker-kohya-1 | CUDA SETUP: CUDA detection failed. Either CUDA driver not installed, CUDA not installed, or you have multiple conflicting CUDA libraries!
kohya-docker-kohya-1 | CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.

My current CUDA version:
Cuda compilation tools, release 10.1, V10.1.243

I'm guessing my CUDA version is too old and not configured properly since it isn't found in path. I'll try reinstalling a version 11 or higher and retrying the docker image.

The text was updated successfully, but these errors were encountered:

frmnboi · 2023-09-14T14:22:03Z

Finished the update. Now using on my host machine:

CUDA 12.0 and Nvidida driver 535 (proprietary)

Now the server does not start and gives me the following messages:

kohya-docker-kohya-1 | /venv/lib/python3.10/site-packages/torch/cuda/init.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
kohya-docker-kohya-1 | return torch._C._cuda_getDeviceCount() > 0
kohya-docker-kohya-1 | LOG INIT
kohya-docker-kohya-1 | INFO: nVidia toolkit detected
kohya-docker-kohya-1 | INFO: Torch 2.0.1+cu118
kohya-docker-kohya-1 | WARNING: Torch reports CUDA not available

martinobettucci · 2023-10-14T12:08:26Z

Looks like you have different drivers between the host and the docker.

You have to use the professional developers drivers in your host machine instead of the general purpose drivers shipped with your os

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to find GPU #23

Unable to find GPU #23

frmnboi commented Sep 11, 2023 •

edited

Loading

frmnboi commented Sep 14, 2023

martinobettucci commented Oct 14, 2023

Unable to find GPU #23

Unable to find GPU #23

Comments

frmnboi commented Sep 11, 2023 • edited Loading

frmnboi commented Sep 14, 2023

martinobettucci commented Oct 14, 2023

frmnboi commented Sep 11, 2023 •

edited

Loading