-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About PPO #24
Comments
I have changed the activate function from relu to tanh, but there is nothing improvement. |
我也遇到这个问题,我咨询elegantrl作者,他说先tahn,再通过torch.distribution来sample action会影响信息熵,所以是没有办法收敛的,但是我不喜欢elegantrl的ppo写法,所以我还在找别人的代码 |
Have you got the right code yet? Could you copy a link? Very appreciate!! |
You can change clip_param from 0.2 into 0.1, constrainting trust regions. This method can work! |
I don't think this code can solve the problem(pendulum), and another question is why this reward is 'running_reward * 0.9 + score * 0.1'
The text was updated successfully, but these errors were encountered: