This repo contains the official implemented Robust IQL (RIQL) algorithm for the ICLR 2024 spotlight paper (⭐ top 5%), "Towards Robust Offline Reinforcement Learning under Diverse Data Corruption". This code is implemented based on the open-sourced CORL library.
We fixed a small bug and ensured setting iql_deterministic=True
as the default hyperparameter in our experiments, which is more stable and generally performs better. We have discussed the deterministic policy in Appendix E.4 of our paper.
Install torch>=1.7.1, gym, mujoco_py, d4rl, pyrallis, wandb, tqdm.
Run RIQL with random observation corruption:
CUDA_VISIBLE_DEVICES=${gpu} python --corruption_mode random --corrupt_obs --corruption_range ${corruption_range} --corruption_rate ${corruption_rate} --env_name ${env_name} --seed ${seed}
'env_name' can be 'halfcheetah-medium-replay-v2', 'walker2d-medium-replay-v2', 'hopper-medium-replay-v2', ....
'corruption_range' and 'corruption_rate' are set to 1.0 and 0.3 by default.
Replace '--corrupt_obs' with '--corrupt_reward', '--corrupt_acts', and '--corrupt_dynamics' to enforce corruption on rewards, actions, and dynamics.
Run RIQL with adversarial observation corruption:
CUDA_VISIBLE_DEVICES=${gpu} python --corruption_mode adversarial --corruption_obs --corruption_range ${corruption_range} --corruption_rate ${corruption_rate} --env_name ${env_name} --seed ${seed}
The adversarial attacks on obs, actions, and next-obs require performing gradient-based attack and will save the corrupted data. After saving the corrupted data, we will load these data for later training.
To run the algorithm with a clean dataset, you can run the following command without specifying the corruption-related parameters
CUDA_VISIBLE_DEVICES=${gpu} python --env_name ${env_name} --seed ${seed}
You can replace the with other baselines, such as,,, and, to run IQL, CQL, EDAC, and MSG.
If you find our work helpful for your research, please cite:
title={Towards Robust Offline Reinforcement Learning under Diverse Data Corruption},
author={Yang, Rui and Zhong, Han and Xu, Jiawei and Zhang, Amy and Zhang, Chongjie and Han, Lei and Zhang, Tong},
booktitle={The Twelfth International Conference on Learning Representations},