-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trainer.fit() Object Memory Error in Local Machine #253
Comments
These changes helped me:
But I am surprised that I had to do these changes, given my computer's characteristics |
Yes, I changed the batch size and I could get it to run on ubuntu. However in windows I couldn't run it even with these changes. The path manipulations are causing some problems. |
Same issue here, on both laptops (powerfull Dell XPS 15, and weaker HP from work). After using num_workers=1 and batch_size=32 I could run just fine, it took about 30 minutes to train model. If I work on a single machine, does the num_workers affect this OOM problem, or all i have to do is manipulate batch_size variable ? Does changing to 2 workers allow me to use larger batch size? Or does my computer(each core) just cannot handle more that batch_size=32 ? Would love to get explanation about how does num_workers, resources_per_worker and batch_size relate to my RAM. In Addition: Using GPU on my Dell solves this issue, and trains model in a minute, but i am interested in using only CPU. While running the whole setup my computer RAM usage is about 70% which leaves about 4/5GB of RAM for training, it is simply not enough, so any configuration (if not using GPU) wont allow my to easily train? |
@capmichal @totovivi I had the same reactions and questions. After a bit of research and chatting with GPT, here's what I gathered :
So I guess the main issue regarding memory is how much data is passed by iteration (batch_size * num_workers). Using less workers should however be better as each worker has overhead and must load its own copy of the model. |
I run out of memory in trainer.fit(), I have 8gb RAM and i7 8th gen 12 core CPU. I also see error in trainer.fit(). Is there any way to reduce the load on RAM or will I need another RAM to be able to run this.
The text was updated successfully, but these errors were encountered: