-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA out of memory #9
Comments
Hi, the SeTR/ViT-MLA version is quite memory consuming, and it takes nearly 20G GPU memory even with 1 batch size in my side. I suggest maybe you can (i) train the model on GPUs with more RAM (Tesla-V100 32G, etc); (ii) turn off the val set evaluation during training; or (iii) replace ViT-base backbone with lightweight backbone (ViT-small or ViT-tiny etc). You can find examples to add more backbones in vision_transformer |
Thank you @XiongweiWu . For suggestions: (i) I understand that the model need a stronger GPU, so no remark about this. (ii) How can I turn off the val set? Removing the (iii)
Another question, does the |
You can change the iteration number for evaluation beyond the total training iterations (set as 80k, etc.) in config file. You can evaluate the checkpoints in other GPUs separately before the training process is finished. For you second question I may take some time to check the case since I have not tried it before. I may update here when I finish. |
Thanks @XiongweiWu I tried but I couldn't find how to turn off val set. Is it here? runner = dict(type='IterBasedRunner', max_iters=80000) |
Thanks for your error report and we appreciate it a lot.
Checklist
Describe the bug
I'm trying to finetune a food segmentation model found here, on new dataset.
When trying to train the model. I got this error. The
batch_size
is set to 1.Thank you in advance for any insights you can give.
Reproduction
Command
Configuration file
I used this japanese dataset for food segmentation: https://mm.cs.uec.ac.jp/uecfoodpix/
I'm trying to finetune it on the japanese data.
Environment
Error traceback
The text was updated successfully, but these errors were encountered: