Skip to content
/ SPOT Public

Code release for "Supported Policy Optimization for Offline Reinforcement Learning" (NeurIPS 2022), https://arxiv.org/abs/2202.06239

License

Notifications You must be signed in to change notification settings

thuml/SPOT

Repository files navigation

Supported Policy Optimization

Official implementation for NeurIPS 2022 paper Supported Policy Optimization for Offline Reinforcement Learning.

🚩 News:

Environment

  1. Install MuJoCo version 2.0 at ~/.mujoco/mujoco200 and copy license key to ~/.mujoco/mjkey.txt
  2. Create a conda environment
conda env create -f conda_env.yml
conda activate spot
  1. Install D4RL

Usage

Pretrained Models

We have uploaded pretrained VAE models and offline models to facilitate experiment reproduction. Download from this link and unzip:

unzip spot-models.zip -d .

Offline RL

Run the following command to train VAE.

python train_vae.py --env halfcheetah --dataset medium-replay
python train_vae.py --env antmaze --dataset medium-diverse --no_normalize

Run the following command to train offline RL on D4RL with pretrained VAE models.

python main.py --config configs/offline/halfcheetah-medium-replay.yml
python main.py --config configs/offline/antmaze-medium-diverse.yml

You can also specify the random seed and VAE model:

python main.py --config configs/offline/halfcheetah-medium-replay.yml --seed <seed> --vae_model_path <vae_model.pt>

Logging

This codebase uses tensorboard. You can view saved runs with:

tensorboard --logdir <run_dir>

Online Fine-tuning

Run the following command to online fine-tune on AntMaze with pretrained VAE models and offline models.

python main_finetune.py --config configs/online_finetune/antmaze-medium-diverse.yml

You can also specify the random seed, VAE model and offline models:

python main_finetune.py --config configs/online_finetune/antmaze-medium-diverse.yml --seed <seed> --vae_model_path <vae_model.pt> --pretrain_model <pretrain_model/>

Citation

If you find this code useful for your research, please cite our paper as:

@inproceedings{wu2022supported,
  title={Supported Policy Optimization for Offline Reinforcement Learning},
  author={Jialong Wu and Haixu Wu and Zihan Qiu and Jianmin Wang and Mingsheng Long},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

Contact

If you have any question, please contact wujialong0229@gmail.com .

Acknowledgement

This repo borrows heavily from sfujim/TD3_BC and sfujim/BCQ.

About

Code release for "Supported Policy Optimization for Offline Reinforcement Learning" (NeurIPS 2022), https://arxiv.org/abs/2202.06239

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages