You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a bi-level optimization where model A produces some output vector z that is used as a regularization in the loss_fn of model B that is optimized on some image data. Assuming my implementation is correct, I noticed that when model B is optimized via Adam, gradients become nan but any other optimizer works fine. After some hours of digging I noticed the eps_root hyper-parameter is set by default to 0.0. Changing this value fixes the issue. My concern is, why is this so. Is this an issue from my implementation of this is expected in some cases?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I have a bi-level optimization where
model A
produces some output vectorz
that is used as a regularization in theloss_fn
ofmodel B
that is optimized on some image data. Assuming my implementation is correct, I noticed that whenmodel B
is optimized via Adam, gradients becomenan
but any other optimizer works fine. After some hours of digging I noticed theeps_root
hyper-parameter is set by default to0.0
. Changing this value fixes the issue. My concern is, why is this so. Is this an issue from my implementation of this is expected in some cases?Beta Was this translation helpful? Give feedback.
All reactions