Bi-level optimization and Adam causing NaNs #484

averageFlaxUser · 2023-02-03T22:10:49Z

averageFlaxUser
Feb 3, 2023

I have a bi-level optimization where model A produces some output vector z that is used as a regularization in the loss_fn of model B that is optimized on some image data. Assuming my implementation is correct, I noticed that when model B is optimized via Adam, gradients become nan but any other optimizer works fine. After some hours of digging I noticed the eps_root hyper-parameter is set by default to 0.0. Changing this value fixes the issue. My concern is, why is this so. Is this an issue from my implementation of this is expected in some cases?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bi-level optimization and Adam causing NaNs #484

{{title}}

Replies: 0 comments

Select a reply

Bi-level optimization and Adam causing NaNs #484

averageFlaxUser Feb 3, 2023

Replies: 0 comments

averageFlaxUser
Feb 3, 2023