Skip to content

Code for ICLR 2024 paper "Towards Robust Offline Reinforcement Learning under Diverse Data Corruption"

Notifications You must be signed in to change notification settings

YangRui2015/RIQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Towards Robust Offline RL under Diverse Data Corruption

This repo contains the official implemented Robust IQL (RIQL) algorithm for the ICLR 2024 spotlight paper (⭐ top 5%), "Towards Robust Offline Reinforcement Learning under Diverse Data Corruption". This code is implemented based on the open-sourced CORL library.

Note

We fixed a small bug and ensured setting iql_deterministic=True as the default hyperparameter in our experiments, which is more stable and generally performs better. We have discussed the deterministic policy in Appendix E.4 of our paper.

Getting started

Install torch>=1.7.1, gym, mujoco_py, d4rl, pyrallis, wandb, tqdm.

Under Random Data Corruption

Run RIQL with random observation corruption:

CUDA_VISIBLE_DEVICES=${gpu} python RIQL.py --corruption_mode random  --corrupt_obs --corruption_range ${corruption_range} --corruption_rate ${corruption_rate}  --env_name ${env_name} --seed ${seed} 

'env_name' can be 'halfcheetah-medium-replay-v2', 'walker2d-medium-replay-v2', 'hopper-medium-replay-v2', ....

'corruption_range' and 'corruption_rate' are set to 1.0 and 0.3 by default.

Replace '--corrupt_obs' with '--corrupt_reward', '--corrupt_acts', and '--corrupt_dynamics' to enforce corruption on rewards, actions, and dynamics.

Under Adversarial Data Corruption

Run RIQL with adversarial observation corruption:

CUDA_VISIBLE_DEVICES=${gpu} python RIQL.py --corruption_mode adversarial --corruption_obs --corruption_range ${corruption_range} --corruption_rate ${corruption_rate}  --env_name ${env_name} --seed ${seed} 

The adversarial attacks on obs, actions, and next-obs require performing gradient-based attack and will save the corrupted data. After saving the corrupted data, we will load these data for later training.

Clean Data

To run the algorithm with a clean dataset, you can run the following command without specifying the corruption-related parameters

CUDA_VISIBLE_DEVICES=${gpu} python RIQL.py  --env_name ${env_name} --seed ${seed} 

Baselines

You can replace the RIQL.py with other baselines, such as IQL.py, CQL.py, EDAC.py, and MSG.py, to run IQL, CQL, EDAC, and MSG.

Citation

If you find our work helpful for your research, please cite:

@inproceedings{yang2023towards,
  title={Towards Robust Offline Reinforcement Learning under Diverse Data Corruption},
  author={Yang, Rui and Zhong, Han and Xu, Jiawei and Zhang, Amy and Zhang, Chongjie and Han, Lei and Zhang, Tong},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=5hAMmCU0bK}
}

About

Code for ICLR 2024 paper "Towards Robust Offline Reinforcement Learning under Diverse Data Corruption"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages