You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi kengz, I find that the convergence performance of training loss (=value loss+policy loss) of ppo algorithem applied in game pong is poor (see Fig.1), but the corresponding mean_returns shows a good upward trend and reaches convergence (see Fig.2).
That is why? how to improve the convergence performance of training loss? I tried many imporved tricks with ppo, but none of them worked.
Fig.1
Fig.2
The text was updated successfully, but these errors were encountered:
Hi kengz, I find that the convergence performance of training loss (=value loss+policy loss) of
ppo
algorithem applied in gamepong
is poor (see Fig.1), but the correspondingmean_returns
shows a good upward trend and reaches convergence (see Fig.2).That is why? how to improve the convergence performance of training loss? I tried many imporved tricks with ppo, but none of them worked.
Fig.1
Fig.2
The text was updated successfully, but these errors were encountered: