This repo contains the official implemented Robust IQL (RIQL) algorithm for the ICLR 2024 spotlight paper (⭐ top 5%), "Towards Robust Offline Reinforcement Learning under Diverse Data Corruption". This code is implemented based on the open-sourced CORL library.
We fixed a small bug and ensured setting iql_deterministic=True
as the default hyperparameter in our experiments, which is more stable and generally performs better. We have discussed the deterministic policy in Appendix E.4 of our paper.
Install torch>=1.7.1, gym, mujoco_py, d4rl, pyrallis, wandb, tqdm.
Run RIQL with random observation corruption:
CUDA_VISIBLE_DEVICES=${gpu} python RIQL.py --corruption_mode random --corrupt_obs --corruption_range ${corruption_range} --corruption_rate ${corruption_rate} --env_name ${env_name} --seed ${seed}
'env_name' can be 'halfcheetah-medium-replay-v2', 'walker2d-medium-replay-v2', 'hopper-medium-replay-v2', ....
'corruption_range' and 'corruption_rate' are set to 1.0 and 0.3 by default.
Replace '--corrupt_obs' with '--corrupt_reward', '--corrupt_acts', and '--corrupt_dynamics' to enforce corruption on rewards, actions, and dynamics.
Run RIQL with adversarial observation corruption:
CUDA_VISIBLE_DEVICES=${gpu} python RIQL.py --corruption_mode adversarial --corruption_obs --corruption_range ${corruption_range} --corruption_rate ${corruption_rate} --env_name ${env_name} --seed ${seed}
The adversarial attacks on obs, actions, and next-obs require performing gradient-based attack and will save the corrupted data. After saving the corrupted data, we will load these data for later training.
To run the algorithm with a clean dataset, you can run the following command without specifying the corruption-related parameters
CUDA_VISIBLE_DEVICES=${gpu} python RIQL.py --env_name ${env_name} --seed ${seed}
You can replace the RIQL.py with other baselines, such as IQL.py, CQL.py, EDAC.py, and MSG.py, to run IQL, CQL, EDAC, and MSG.
If you find our work helpful for your research, please cite:
@inproceedings{yang2023towards,
title={Towards Robust Offline Reinforcement Learning under Diverse Data Corruption},
author={Yang, Rui and Zhong, Han and Xu, Jiawei and Zhang, Amy and Zhang, Chongjie and Han, Lei and Zhang, Tong},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=5hAMmCU0bK}
}