Joint Synthesis of Safety Certificate and Safe Control Policy Using Constrained Reinforcement Learning
This repository is the official implementation of Joint Synthesis of Safety Certificate and Safe Control Policy Using Constrained Reinforcement Learning. The code base of this implementation is the Parallel Asynchronous Buffer-Actor-Learner (PABAL) architecture, which includes implementations of most common RL algorithms with the state-of-the-art training efficiency. If you are interested in or want to contribute to PABAL, you can contact me or the original creator. I also reimplemented it with ppo on TF1 after the paper was submitted. TF1 code is directly modified from safety-gym open-sourced code, which is easier to setup and run. The results of two versions only have differences in the early training stage, which does not affect the claimed performance.
First, install Safety-gym. Then you might replace the engine.py in safety-gym package with our custom engine. We modify the engine.py to estimate the distance and velocity, so that the code could be more general to any other tasks with the distance and velocity observations/signals.
To install other requirements:
$ pip install -U ray
$ pip install tensorflow==2.5.0
$ pip install tensorflow_probability==0.13.0
$ pip install seaborn matplotlib
To train the algorithm(s) in the paper, run these commands:
$ export PYTHONPATH=/your/path/to/Reachability_Constrained_RL/:$PYTHONPATH
$ cd ./train_scripts/
$ python train_scripts4fsac.py # FAC-SIS / FAC-\phi_0,\phi_h (changing if updating \phi and the init \phi in config)
Results can be seen with tensorboard:
$ cd ./results/
$ tensorboard --logdir=. --bindall
To test and evaluate trained policies, run:
python train_scripts4fsac.py --mode testing --test_dir <your_log_dir> --test_iter_list <iter_nums>
and the results will be recored in /results/<ENV_NAME>/<ALGO_NAME>/<EXP_TIME>/logs/tester
.
When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with me before making a change.
@InProceedings{ma22joint,
title = {Joint Synthesis of Safety Certificate and Safe Control Policy Using Constrained Reinforcement Learning},
author = {Ma, Haitong and Liu, Changliu and Li, Shengbo Eben and Zheng, Sifa and Chen, Jianyu},
booktitle = {Proceedings of The 4th Annual Learning for Dynamics and Control Conference},
pages = {97--109},
year = {2022},
editor = {Firoozi, Roya and Mehr, Negar and Yel, Esen and Antonova, Rika and Bohg, Jeannette and Schwager, Mac and Kochenderfer, Mykel},
volume = {168},
series = {Proceedings of Machine Learning Research},
month = {23--24 Jun},
publisher = {PMLR},
pdf = {https://proceedings.mlr.press/v168/ma22a/ma22a.pdf},
url = {https://proceedings.mlr.press/v168/ma22a.html},
}