You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In training PPO-Recurrent over different epochs we do not update the LSTM states even though the LSTM weights get updated. Is there a reason to do so? Or is it just to save compute and does not effect the optimization process a lot?
Or is it just to save compute and does not effect the optimization process a lot?
yes.
They are mostly used to get a better initialization of the hidden state of the LSTM.
(and also, the updated LSTM should not be too far in parameter space to the old LSTM used to collect the data)
❓ Question
In training PPO-Recurrent over different epochs we do not update the LSTM states even though the LSTM weights get updated. Is there a reason to do so? Or is it just to save compute and does not effect the optimization process a lot?
https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/sb3_contrib/ppo_recurrent/ppo_recurrent.py#L345-L349
Checklist
The text was updated successfully, but these errors were encountered: