Proximal Policy Optimization with state-of-the-art code level optimizations

Implementation

basic policy gradient algorithm

basic policy gradient algorithm, good summary see [1]
a2c architecture (adding the critic + return/simple advantage estimation), see [2] for details
minibatch gradient descent with automatic differentiation (i.e tf.GradientTape)
shuffle and permutate-only for MGD

proper PPO agent and most common improvements

tf features

Sample Runs

Custom Environments

ContCartpoalEnv	Episode Scores / Steps	ReachingDotEnv	Episode Scores / Steps

Classic Control Environments

CartPole-v1	Episode Scores / Steps	Pendulum-v0	Episode Scores / Steps

SimFramework Environments

ReachEnv-v0	Episode Scores / Steps	ReachEnvRandom-v0	Episode Scores / Steps

Dependencies

pip requirements
imagemagick for creating gifs of env runs
graphviz for tf keras model to graph in tensorboard
mujoco_py's offscreen rendering is buggy in gym, for using run_model (GIF generation)
- adjust mujoco_py.MjRenderContextOffscreen(sim, None, device_id=0) in gym/envs/mujoco/mujoco_env.MujocoEnv._get_viewer(...)

References

[1] Basic Policy Gradient Algorithm -> https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html
[2] A2C architecture -> Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International conference on machine learning. 2016.
[3] Basic PPO Agent -> Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).
[4] GAE -> Schulman, John, et al. "High-dimensional continuous control using generalized advantage estimation." arXiv preprint arXiv:1506.02438 (2015).
[5] Common Improvements -> Engstrom, Logan, et al. "Implementation Matters in Deep RL: A Case Study on PPO and TRPO." International Conference on Learning Representations. 2019.
[6] StableBaselines3 -> Raffin et al, "StableBaselines3", GitHub, https://github.com/DLR-RM/stable-baselines3
[7] ContCartpoalEnv -> this environment is from Ian Danforth https://gist.github.com/iandanforth/e3ffb67cf3623153e968f2afdfb01dc8

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
envs		envs
logs/ppoagent		logs/ppoagent
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_cfg.py		_cfg.py
_nets.py		_nets.py
_utils.py		_utils.py
compress_logs.bat		compress_logs.bat
decompress_logs.bat		decompress_logs.bat
ppo_agent.py		ppo_agent.py
requirements.txt		requirements.txt
run_model.py		run_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proximal Policy Optimization with state-of-the-art code level optimizations

Implementation

basic policy gradient algorithm

proper PPO agent and most common improvements

tf features

Sample Runs

Custom Environments

Classic Control Environments

SimFramework Environments

Dependencies

References

About

Releases

Languages

License

denismegerle/rl-ppo-agent

Folders and files

Latest commit

History

Repository files navigation

Proximal Policy Optimization with state-of-the-art code level optimizations

Implementation

basic policy gradient algorithm

proper PPO agent and most common improvements

tf features

Sample Runs

Custom Environments

Classic Control Environments

SimFramework Environments

Dependencies

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages