how to improve the convergence performance of training loss? #510

williamyuanv0 · 2022-05-30T07:10:07Z

Hi kengz, I find that the convergence performance of training loss (=value loss+policy loss) of ppo algorithem applied in game pong is poor (see Fig.1), but the corresponding mean_returns shows a good upward trend and reaches convergence (see Fig.2).
That is why? how to improve the convergence performance of training loss? I tried many imporved tricks with ppo, but none of them worked.

Fig.1

Fig.2

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to improve the convergence performance of training loss? #510

how to improve the convergence performance of training loss? #510

williamyuanv0 commented May 30, 2022

how to improve the convergence performance of training loss? #510

how to improve the convergence performance of training loss? #510

Comments

williamyuanv0 commented May 30, 2022