You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was able to build and run the application offline on my own hardware. However, I was met with:
Torch reports CUDA not available
and I cannot train anything with FP16 precision, instead having to train using the cpu exclusively.
About my system:
I installed the nvidia toolkit using the instructions and configured it, restarting docker daemon:
[https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html]
Output for driver and CUDA from nvidia-smi:
** NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4**
System information:
Linux Mint 20 x86_64
5.15.0-83-generic
GPU information:
RTX 3070
registered as dev/nvidia0
Any ideas what might be causing this?
Edit:
After the cpu training was finished, I got another error that may shed light on what was causing this:
kohya-docker-kohya-1 | CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
kohya-docker-kohya-1 | CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
kohya-docker-kohya-1 | CUDA exception! Error code: forward compatibility was attempted on non supported HW
kohya-docker-kohya-1 | CUDA exception! Error code: initialization error
kohya-docker-kohya-1 | CUDA SETUP: Highest compute capability among GPUs detected: None
kohya-docker-kohya-1 | CUDA exception! Error code: forward compatibility was attempted on non supported HW
kohya-docker-kohya-1 | CUDA SETUP: CUDA version lower than 11 are currenlty not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
kohya-docker-kohya-1 | CUDA SETUP: Detected CUDA version 00
kohya-docker-kohya-1 | CUDA SETUP: TODO: compile library for specific version: libbitsandbytes_cuda00_nocublaslt.so
kohya-docker-kohya-1 | CUDA SETUP: Defaulting to libbitsandbytes.so...
kohya-docker-kohya-1 | CUDA SETUP: CUDA detection failed. Either CUDA driver not installed, CUDA not installed, or you have multiple conflicting CUDA libraries!
kohya-docker-kohya-1 | CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113.
My current CUDA version:
Cuda compilation tools, release 10.1, V10.1.243
I'm guessing my CUDA version is too old and not configured properly since it isn't found in path. I'll try reinstalling a version 11 or higher and retrying the docker image.
The text was updated successfully, but these errors were encountered:
Finished the update. Now using on my host machine:
CUDA 12.0 and Nvidida driver 535 (proprietary)
Now the server does not start and gives me the following messages:
kohya-docker-kohya-1 | /venv/lib/python3.10/site-packages/torch/cuda/init.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
kohya-docker-kohya-1 | return torch._C._cuda_getDeviceCount() > 0
kohya-docker-kohya-1 | LOG INIT
kohya-docker-kohya-1 | INFO: nVidia toolkit detected
kohya-docker-kohya-1 | INFO: Torch 2.0.1+cu118
kohya-docker-kohya-1 | WARNING: Torch reports CUDA not available
I was able to build and run the application offline on my own hardware. However, I was met with:
Torch reports CUDA not available
and I cannot train anything with FP16 precision, instead having to train using the cpu exclusively.
About my system:
I installed the nvidia toolkit using the instructions and configured it, restarting docker daemon:
[https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html]
Output for driver and CUDA from nvidia-smi:
** NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4**
System information:
Linux Mint 20 x86_64
5.15.0-83-generic
GPU information:
RTX 3070
registered as dev/nvidia0
Any ideas what might be causing this?
Edit:
After the cpu training was finished, I got another error that may shed light on what was causing this:
kohya-docker-kohya-1 | CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
kohya-docker-kohya-1 | CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
kohya-docker-kohya-1 | CUDA exception! Error code: forward compatibility was attempted on non supported HW
kohya-docker-kohya-1 | CUDA exception! Error code: initialization error
kohya-docker-kohya-1 | CUDA SETUP: Highest compute capability among GPUs detected: None
kohya-docker-kohya-1 | CUDA exception! Error code: forward compatibility was attempted on non supported HW
kohya-docker-kohya-1 | CUDA SETUP: CUDA version lower than 11 are currenlty not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
kohya-docker-kohya-1 | CUDA SETUP: Detected CUDA version 00
kohya-docker-kohya-1 | CUDA SETUP: TODO: compile library for specific version: libbitsandbytes_cuda00_nocublaslt.so
kohya-docker-kohya-1 | CUDA SETUP: Defaulting to libbitsandbytes.so...
kohya-docker-kohya-1 | CUDA SETUP: CUDA detection failed. Either CUDA driver not installed, CUDA not installed, or you have multiple conflicting CUDA libraries!
kohya-docker-kohya-1 | CUDA SETUP: If you compiled from source, try again with
make CUDA_VERSION=DETECTED_CUDA_VERSION
for example,make CUDA_VERSION=113
.My current CUDA version:
Cuda compilation tools, release 10.1, V10.1.243
I'm guessing my CUDA version is too old and not configured properly since it isn't found in path. I'll try reinstalling a version 11 or higher and retrying the docker image.
The text was updated successfully, but these errors were encountered: