Excessive memory consumption in hyperopt scan #2150

scarlehoff · 2024-09-09T06:40:44Z

When running a hyperparameter scan, the memory grows proportional to the number of trials.

This smells like a memory leak, since from one trial to the next virtually all information can be safely dropped. In principle there is a call to clear_backend between hyperopts but clearly not all memory is emptied. This should be fixed.

@Cmurilochem @goord I understand you don't have more time in the project to debug this issue, so I'll do it at some point, but have you by any chance already looked into this (and maybe know where the memory leak is coming from). It will save me some time.

edit: some more info, I can get away with a parallel run of ~60-100 replicas with ~20 GB of RAM. Running a hyperparameter scan should, in the worst-case-scenario, take nfolds X ~20 GB. Even that should be reduced since there's no reason to keep around the NN weights and gradients after the fold has finished.

The text was updated successfully, but these errors were encountered:

Cmurilochem · 2024-09-09T11:22:15Z

I remember we discussed about memory leaks in connection with tensorflow versions a while ago. But I guess that this is another thing and honestly I have never looked at this issue.

goord · 2024-09-12T08:07:45Z

Are you running on GPU or CPU @scarlehoff ?

scarlehoff · 2024-09-12T08:15:18Z

GPU.

scarlehoff added bug Something isn't working hyperoptimization labels Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive memory consumption in hyperopt scan #2150

Excessive memory consumption in hyperopt scan #2150

scarlehoff commented Sep 9, 2024 •

edited

Loading

Cmurilochem commented Sep 9, 2024

goord commented Sep 12, 2024

scarlehoff commented Sep 12, 2024

Excessive memory consumption in hyperopt scan #2150

Excessive memory consumption in hyperopt scan #2150

Comments

scarlehoff commented Sep 9, 2024 • edited Loading

Cmurilochem commented Sep 9, 2024

goord commented Sep 12, 2024

scarlehoff commented Sep 12, 2024

scarlehoff commented Sep 9, 2024 •

edited

Loading