Error in finetuning #30

Ravindu-Yasas-Nagasinghe · 2023-10-17T11:13:02Z

When running the command inside the docker image for finetuning LF-VILA, following error is created,
root@8dccc81930c3:/LF-VILA# deepspeed src/tasks/run_video_classification.py --distributed --blob_mount_dir /blob_mount --config $CONFIG_PATH --deepspeed
[2023-10-17 11:11:02,765] [WARNING] [runner.py:132:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Traceback (most recent call last):
File "/usr/local/bin/deepspeed", line 6, in
main()
File "/usr/local/lib/python3.8/dist-packages/deepspeed/launcher/runner.py", line 308, in main
raise RuntimeError("Unable to proceed, no GPU resources available")
RuntimeError: Unable to proceed, no GPU resources available

Please note that my device has GPUs available and cuda and torch are correctly installed.

Ravindu-Yasas-Nagasinghe · 2023-10-17T11:14:31Z

nvidia-smi command also does not work inside the docker image.
nvidia-smi works outside of docker image you have provided.

nvcc -V and torch works inside the docker image

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in finetuning #30

Error in finetuning #30

Ravindu-Yasas-Nagasinghe commented Oct 17, 2023 •

edited

Loading

Ravindu-Yasas-Nagasinghe commented Oct 17, 2023 •

edited

Loading

Error in finetuning #30

Error in finetuning #30

Comments

Ravindu-Yasas-Nagasinghe commented Oct 17, 2023 • edited Loading

Ravindu-Yasas-Nagasinghe commented Oct 17, 2023 • edited Loading

Ravindu-Yasas-Nagasinghe commented Oct 17, 2023 •

edited

Loading

Ravindu-Yasas-Nagasinghe commented Oct 17, 2023 •

edited

Loading