Off Policy RL for Continuous Control Consolidated (OffCon³)

Code for the OffCon³ paper available here.

A minimal PyTorch implementation from scratch of the two model-free state of the art off-policy continuous control algoirthms:

Twin Delayed DDPG (TD3)
Soft Actor Critic (SAC)

This repo consolidates, where possible, the code between these two similar off-policy methods, and highlights the similarities (i.e., optimisation scheme) and differences (i.e., stochastic v.s. deterministic policies). As highlighted in the paper, these implementations utilize 3 hidden layer MLPs (instead of 2) as overall these appear to perform better, especially in HalfCheetah.

Heavily based on my other repos, TD3-PyTorch and SAC-PyTorch. If you only want to use one of these algorithms, those repos may serve you better.

To cite this repo, please use the following BiBTex:

@misc{ball2021offcon3,
      title={OffCon$^3$: What is state of the art anyway?}, 
      author={Philip J. Ball and Stephen J. Roberts},
      year={2021},
      eprint={2101.11331},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Implementation Details

TD3

This code implements the Addressing Function Approximation Error in Actor-Critic Methods paper, using SAC hyperparameters where appropriate (i.e., learning rate, collection steps).

SAC

This code implements the follow up paper Soft Actor-Critic Algorithms and Applications, which includes a learned entropy trade-off hyperparameter. As noted above, 3 hidden layer MLPs are used in the actor and critic.

TDS

As mentioned in the paper, this is SVG(0) with double-Q (or SAC without entropy); analysis shows this is essentially DDPG when trained on standard Gym MuJoCo.

Instructions

Quick Start

Simply run:

python train_agent.py

for default args. Changeable args are:

--env: String of environment name (Default: HalfCheetah-v2)
--alg: String of policy optimizer (Default: td3; Choices: {td3, sac, tds})
--yaml_config: String of YAML config file for either TD3, SAC or TDS (Default: None)
--seed: Int of seed (Default: 100)
--use_obs_filter: Boolean that is true when used (seems to degrade performance, Default: False)
--update_every_n_steps: Int of how many env steps we take before optimizing the agent (Default: 1, ratio of steps v.s. backprop is tied to 1:1)
--n_random_actions: Int of how many random steps we take to 'seed' the replay pool (Default: 10000)
--n_collect_steps: Int of how steps we collect before training  (Default: 1000)
--n_evals: Int of how many episodes we run an evaluation for (Default: 1)
--checkpoint_interval: Int of how often to checkpoint model (i.e., saving, making gifs)
--save_model: Boolean that is true when used, saving the model parameters
--make_gif: Boolean that is true when used; makes
--save_replay_pool: Boolean that saves the replay pool along with the agent parameters (defaults to False, as this is very costly memory-wise)
--load_model_path: Path to the directory where model .pt files were saved; loads and resumes training from that snapshot

Details

There are algorithm specific YAML files stored in ./configs/ for TD3 and SAC. These contain default configurations and hyperparameters that work well in OpenAI MuJoCo tasks. If no file is specified in the --yaml_config argument, then default YAMLs are loaded.

Also included is a run_experiments.py file, that allows the running of 5 simultaneous experiments with different seeds.

Results

See paper.

TL;DR: This seems to perform in the worst case, as well as author's code, and in the best case, significantly better.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
configs		configs
LICENSE		LICENSE
README.md		README.md
agents.py		agents.py
networks.py		networks.py
run_experiments.py		run_experiments.py
run_experiments_new.py		run_experiments_new.py
train_agent.py		train_agent.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Off Policy RL for Continuous Control Consolidated (OffCon³)

Implementation Details

TD3

SAC

TDS

Instructions

Quick Start

Details

Results

About

Releases

Packages

Languages

License

philipjball/OffCon3

Folders and files

Latest commit

History

Repository files navigation

Off Policy RL for Continuous Control Consolidated (OffCon3)

Implementation Details

TD3

SAC

TDS

Instructions

Quick Start

Details

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Off Policy RL for Continuous Control Consolidated (OffCon³)

Packages