How to run on multiple machines ? #42

AnnemSony · 2023-07-06T04:04:42Z

No description provided.

tianrun-chen · 2023-07-09T01:52:08Z

Do you mean multiple GPUs?

AnnemSony · 2023-07-09T04:31:49Z

I have GPU'S in multiple machine(means on node clusters), how can I run the command.

chusheng0505 · 2023-07-11T08:51:24Z

Hi , I have 4 gpus and trying to tune the SAM-Adapter model
I used the command provided in git
command used : CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch train.py --nnodes 1 --nproc_per_node 4 --config configs/demo.yaml

I have successed training but i found that there is only one gpu is used !!
how can i solve this problem ...？(I have checked the documents of torch but don't have any idea to debug it ...?
@tianrun-chen

Bill-Ren · 2023-07-23T07:02:34Z

I also encountered this problem, and only O cards were used during distributed training. At the same time, I did not find the input of these two parameters --nnodes 1 --nproc_per_node 4 in the input of train.py. Why?

Bill-Ren · 2023-07-24T03:21:11Z

Hi , I have 4 gpus and trying to tune the SAM-Adapter model I used the command provided in git command used : CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch train.py --nnodes 1 --nproc_per_node 4 --config configs/demo.yaml

I have successed training but i found that there is only one gpu is used !! how can i solve this problem ...？(I have checked the documents of torch but don't have any idea to debug it ...? @tianrun-chen

I found a solution to the problem. Finally, I should run the code like this: CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nnodes 1 --nproc_per_node 4 train.py --config configs/demo.yaml --tag exp1 , you can check the usage of torch.distributed.launch for details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run on multiple machines ? #42

How to run on multiple machines ? #42

AnnemSony commented Jul 6, 2023

tianrun-chen commented Jul 9, 2023

AnnemSony commented Jul 9, 2023

chusheng0505 commented Jul 11, 2023 •

edited

Loading

Bill-Ren commented Jul 23, 2023

Bill-Ren commented Jul 24, 2023

How to run on multiple machines ? #42

How to run on multiple machines ? #42

Comments

AnnemSony commented Jul 6, 2023

tianrun-chen commented Jul 9, 2023

AnnemSony commented Jul 9, 2023

chusheng0505 commented Jul 11, 2023 • edited Loading

Bill-Ren commented Jul 23, 2023

Bill-Ren commented Jul 24, 2023

chusheng0505 commented Jul 11, 2023 •

edited

Loading