Continuous PPO with Tensorflow 2.0

A minimalistic implementation of OpenAI's proximal policy optimization algorithm. It learns to swing up a pendulum (from OpenAI Gym). There is much room for performance improvement, so far training computation happens on only 1 GPU even if more ressources are available.

Usage:

To train the agent run python main.py train
To run an episode (after training) run python main.py enjoy

Generalized Advantage Estimation

The advantage value is calculated using the Generalized Advantage Estimation (GAE) Method. When reading through the source code one might wonder about the usge of LinearOperatorToeplitz. This operator lets us calculate the GAE values with a simple Matrix-Vector multiplication:

The terms are the temporal difference errors (TD-errors):

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
README.md		README.md
learner.py		learner.py
main.py		main.py
progress_writer.py		progress_writer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continuous PPO with Tensorflow 2.0

Generalized Advantage Estimation

About

Releases

Packages

Contributors 2

Languages

dstuemk/simple-ppo-continuous

Folders and files

Latest commit

History

Repository files navigation

Continuous PPO with Tensorflow 2.0

Generalized Advantage Estimation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages