You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running a hyperparameter scan, the memory grows proportional to the number of trials.
This smells like a memory leak, since from one trial to the next virtually all information can be safely dropped. In principle there is a call to clear_backend between hyperopts but clearly not all memory is emptied. This should be fixed.
@Cmurilochem@goord I understand you don't have more time in the project to debug this issue, so I'll do it at some point, but have you by any chance already looked into this (and maybe know where the memory leak is coming from). It will save me some time.
edit: some more info, I can get away with a parallel run of ~60-100 replicas with ~20 GB of RAM. Running a hyperparameter scan should, in the worst-case-scenario, take nfolds X ~20 GB. Even that should be reduced since there's no reason to keep around the NN weights and gradients after the fold has finished.
The text was updated successfully, but these errors were encountered:
I remember we discussed about memory leaks in connection with tensorflow versions a while ago. But I guess that this is another thing and honestly I have never looked at this issue.
When running a hyperparameter scan, the memory grows proportional to the number of trials.
This smells like a memory leak, since from one trial to the next virtually all information can be safely dropped. In principle there is a call to
clear_backend
between hyperopts but clearly not all memory is emptied. This should be fixed.@Cmurilochem @goord I understand you don't have more time in the project to debug this issue, so I'll do it at some point, but have you by any chance already looked into this (and maybe know where the memory leak is coming from). It will save me some time.
edit: some more info, I can get away with a parallel run of ~60-100 replicas with ~20 GB of RAM. Running a hyperparameter scan should, in the worst-case-scenario, take nfolds X ~20 GB. Even that should be reduced since there's no reason to keep around the NN weights and gradients after the fold has finished.
The text was updated successfully, but these errors were encountered: