-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPO Implementation Ignores Time Limits #20
Comments
I believe there's ongoing discussion on this for CleanRL, though I've not caught up with the latest. My understanding is that properly handling this does not usually result in significant performance differences. |
That being said, if you would be interested in doing a PR for this with another file (say, |
Thanks, that clears things up. Wasn't sure if it was perhaps handled elsewhere. Concerning the ablation: If I end up requiring truncation, I'll see if I can cook up a PR. |
That's a good point! I think this could be worth doing in a separate file so people can see the differences. There is a significant downside of doubling the observation size. |
Hi,
The current PPO implementation does not seem to account for time limits. While the
EpisodeWrapper
from brax is used, which tracks a truncation flag (source) in the info dictionary for correct termination handling, it appears this aspect is overlooked in the implementation.Related Information:
Could this be an oversight, or am I missing a part of the implementation that addresses this?
The text was updated successfully, but these errors were encountered: