-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the calculation method of loss when there are multiple gpus #101
Comments
My idea is just like yours. After debugging, I found that during the training epoch, all GPUs compute the same global loss with the same sim_matrix instead of individually calculating local losses and then gathering and averaging them. There is a clear computation overlap here. I also have seen that in the function "train_epoch", there is an useless computation "loss.mean()" that seems do nothing after the model.forward(). We only need do local loss following the openai/CLIP#132 and do loss.backward(), The gradient synchronization will be done automatically by DDP. |
CLIP4Clip/modules/modeling.py
Line 400 in 508ffa3
The current code seems to calculate the loss on the global similarity matrix on each gpu. Computing loss only for local and global features as described in openai/CLIP#132 seems to be more computationally and memory efficient.
Sorry to bother you if I misunderstood the code
The text was updated successfully, but these errors were encountered: