Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in finetuning #30

Open
Ravindu-Yasas-Nagasinghe opened this issue Oct 17, 2023 · 1 comment
Open

Error in finetuning #30

Ravindu-Yasas-Nagasinghe opened this issue Oct 17, 2023 · 1 comment

Comments

@Ravindu-Yasas-Nagasinghe
Copy link

Ravindu-Yasas-Nagasinghe commented Oct 17, 2023

When running the command inside the docker image for finetuning LF-VILA, following error is created,
root@8dccc81930c3:/LF-VILA# deepspeed src/tasks/run_video_classification.py --distributed --blob_mount_dir /blob_mount --config $CONFIG_PATH --deepspeed
[2023-10-17 11:11:02,765] [WARNING] [runner.py:132:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Traceback (most recent call last):
File "/usr/local/bin/deepspeed", line 6, in
main()
File "/usr/local/lib/python3.8/dist-packages/deepspeed/launcher/runner.py", line 308, in main
raise RuntimeError("Unable to proceed, no GPU resources available")
RuntimeError: Unable to proceed, no GPU resources available

Please note that my device has GPUs available and cuda and torch are correctly installed.

@Ravindu-Yasas-Nagasinghe
Copy link
Author

Ravindu-Yasas-Nagasinghe commented Oct 17, 2023

nvidia-smi command also does not work inside the docker image.
nvidia-smi works outside of docker image you have provided.

nvcc -V and torch works inside the docker image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant