Skip to content

[arXiv-2023.3, TNNLS-2024] Official repository of "A Survey on Causal Reinforcement Learning"

License

Notifications You must be signed in to change notification settings

YanaZeng/Awesome-Causal-Reinforcement-Learning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 

Repository files navigation

Awesome Causal Reinforcement Learning

📌 Contents

💁 Abstract

📕 Surveys

📑 Papers

2024 | 2023 | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | Pre-2017

👏 Contributions


💁 Abstract

Causal Reinforcement Learning (CRL) is a suite of algorithms, embedding causal knowledge into RL for more efficient and effective model learning, policy evaluation, or policy optimization. How causality information inspires current RL algorithms is illustrated in the below CRL framework,

CRL framework illustrates how causality information inspires current RL algorithms. This framework contains possible algorithmic connections between planning and causality-inspired learning procedures. Explanations of each arrow are,

  • a) input training data for the causal representation or abstraction learning;
  • b) input representations, abstractions, or training data from the real world for the causal model;
  • c) plan over a learned or given causal model,
  • d) use information from a policy or value network to improve the planning procedure,
  • e) use the result from planning as training targets for a policy or value,
  • g) output an action in the real world from the planning,
  • h) output an action in the real world from the policy/value function,
  • f) input causal representations, abstractions, or training data from the real world for the policy or value update.

Note that most CRL algorithms implement only a subset of the possible connections with causality, enjoying potential benefits in data efficiency, interpretability, robustness, or generalization of the model or policy.

We detailed a comprehensive survey of CRL in the paper, A survey on causal reinforcement learning, and particularly list the causal-reinforcement-related works in this repository.

‼️ We have been continuously updating the latest papers, so the scope of literature covered extends beyond the above survey.

⁉️ Any new related works are welcome to be added via pull requests.

If you find the paper useful, please cite with,

@article{zeng2023survey,
  title={A survey on causal reinforcement learning},
  author={Zeng, Yan and Cai, Ruichu and Sun, Fuchun and Huang, Libo and Hao, Zhifeng},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
  year={2024}
}

📕 Surveys

  • (TNNLS 2024) A survey on causal reinforcement learning [paper]
  • (TNNLS 2023) A survey on reinforcement learning for recommender systems [paper]
  • (arxiv 2022) Causal machine learning: A survey and open problems [paper]
  • (NeurIPS-W 2021) Causal multi-agent reinforcement learning: Review and open problems [paper]
  • (ICML Tutorials 2020) Causal reinforcement learning [tutorial]
  • (Blog 2018) Introduction to causal reinforcement learning [blog]
  • (自动化学报 2024) 基于因果建模的强化学习控制: 现状及展望 [论文]

📑 Papers

2024

  • (ICML 2024) Policy learning for balancing short-term and long-term rewards [paper] [code]
  • (ICML 2024) ACE: Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [paper] [code]
  • (ICML 2024) Causal Action Influence Aware Counterfactual Data Augmentation [paper] [code]
  • (ICML 2024) Learning Causal Dynamics Models in Object-Oriented Environments [paper] [code]
  • (ICML 2024) Agent-Specific Effects: A Causal Effect Propagation Analysis in Multi-Agent MDPs [paper] [code]
  • (ICML 2024) Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation [paper] [code]
  • (ICML 2024) Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement Learning [paper] [code]
  • (ICML 2024) Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals [paper]
  • (AAAI 2024) ACAMDA: Improving Data Efficiency in Reinforcement Learning Through Guided Counterfactual Data Augmentation [paper]
  • (IJCAI 2024) Boosting Efficiency in Task-Agnostic Exploration through Causal Knowledge [paper] [code]
  • (JASA 2024) Off-policy confidence interval estimation with confounded Markov decision process [paper] [code]

2023

  • (NeurIPS 2023) Learning world models with identifiable factorization [paper] [code]
  • (NeurIPS 2023) Interpretable reward redistribution in reinforcement learning: a causal approach [paper]
  • (ICLR 2023) Causal Confusion and Reward Misidentification in Preference-Based Reward Learning [paper]
  • (TPAMI 2023) Invariant policy learning: A causal perspective [paper]
  • (TNNLS 2023) Sample efficient deep reinforcement learning with online state abstraction and causal transformer model prediction [paper]
  • (TII 2023) Spatial-temporal causality modeling for industrial processes with a knowledge-data guided reinforcement learning [paper]
  • (JRSSB 2023) Estimating and improving dynamic treatment regimes with a time-varying instrumental variable [paper]
  • (Operations Research 2023) Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes [paper] [code]
  • (The Annals of Statistics 2023) Off-policy evaluation in partially observed Markov decision processes [paper]
  • (arxiv 2023) MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment [paper]

2022

  • (JMLR 2022) On instrumental variable regression for deep offline policy evaluation [paper] [code]
  • (TNNLS 2022) Fully decentralized multiagent communication via causal inference [paper]
  • (ICLR 2022) On covariate shift of latent confounders in imitation and reinforcement learning [paper] [code]
  • (ICLR 2022) A relational intervention approach for unsupervised dynamics generalization in model-based reinforcement learning [paper] [code]
  • (ICLR 2022) Causal contextual bandits with targeted interventions [paper] [code]
  • (ICLR 2022) Adarl: What, where, and how to adapt in transfer reinforcement learning [paper] [code]
  • (NeurIPS 2022) Generalizing goal-conditioned reinforcement learning with variational causal reasoning [paper] [code]
  • (NeurIPS 2022) Causality-driven hierarchical structure discovery for reinforcement learning [paper]
  • (NeurIPS 2022) Factored Adaptation for Non-stationary Reinforcement Learning [paper]
  • (NeurIPS 2022) Online reinforcement learning for mixed policy scopes [paper]
  • (NeurIPS 2022) Sequence Model Imitation Learning with Unobserved Contexts [paper] [code]
  • (ICML 2022) Fighting fire with fire: avoiding dnn shortcuts through priming [paper] [code]
  • (ICML 2022) A minimax learning approach to off-policy evaluation in confounded partially observable Markov decision processes [paper] [code]
  • (ICML 2022) Action-sufficient state representation learning for control with structural constraints [paper]
  • (ICML 2022) Causal dynamics learning for task-independent state abstraction [paper] [code]
  • (ICML 2022) Causal imitation learning under temporally correlated noise [paper] [code]
  • (ECCV 2022) Resolving copycat problems in visual imitation learning via residual action prediction [paper] [code]
  • (AAAI 2022) Invariant action effect model for reinforcement learning [paper]
  • (CHIL 2022) Counterfactually Guided Policy Transfer in Clinical Settings [paper]
  • (CLeaR 2022) Efficient Reinforcement Learning with Prior Causal Knowledge [paper]
  • (CHIL 2022) Counterfactually Guided Policy Transfer in Clinical Settings [paper]
  • (ICLR-W 2022) Invariant causal representation learning for generalization in imitation and reinforcement learning [paper]
  • (arXiv 2022) Offline reinforcement learning with causal structured world models [paper]

2021

  • (TNNLS 2021) Model-based transfer reinforcement learning based on graphical model representations [paper]
  • (ICML 2021) Keyframe-focused visual imitation learning [paper] [code]
  • (ICML 2021) Causal curiosity: Rl agents discovering self-supervised experiments for causal representation learning [paper] [code]
  • (ICML 2021) Model-free and model-based policy evaluation when causality is uncertain [paper]
  • (ICML 2021) A spectral approach to off-policy evaluation for pomdps [paper]
  • (ICLR 2021) Learning ”what-if” explanations for sequential decision-making [paper]
  • (ICLR 2021) Learning invariant representations for reinforcement learning without reconstruction [paper] [code]
  • (NeurIPS 2021) Invariant causal imitation learning for generalizable policies [paper]
  • (NeurIPS 2021) Provably efficient causal reinforcement learning with confounded observational data [paper]
  • (NeurIPS 2021) Causal Influence Detection for Improving Efficiency in Reinforcement Learning [paper] [code]
  • (NeurIPS 2021) Deep proxy causal learning and its application to confounded bandit policy evaluation [paper] [code]
  • (NeurIPS 2021) Sequential causal imitation learning with unobserved confounders [paper]
  • (NeurIPS 2021) Causal bandits with unknown graph structure [paper]
  • (IJCAI 2021) Inferring time-delayed causal relations in pomdps from the principle of independence of cause and mechanism [paper]
  • (AAAI 2021) Reinforcement learning of causal variables using mediation analysis [paper] [code]
  • (ICRA 2021) Causal reasoning in simulation for structure and transfer learning of robot manipulation policies [paper]
  • (WWW 2021) Cost-effective and interpretable job skill recommendation with deep reinforcement learning [paper] [code]
  • (WWW 2021) Unifying Offline Causal Inference and Online Bandit Learning for Data Driven Decision [paper]
  • (AISTATS 2021) Budgeted and non-budgeted causal bandits [paper]
  • (AISTATS 2021) Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders [paper]
  • (CORL 2021) SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving [paper] [code]
  • (UAI 2021) Bandits with partially observable confounded data [paper]
  • (MICAI 2021) Causal based action selection policy for reinforcement learning [paper]
  • (Management Science 2021) Minimax-optimal policy learning under unobserved confounding [paper]
  • (Proceedings of the IEEE 2021) Toward causal representation learning [paper]
  • (INAOE report 2021) Combining reinforcement learning and causal models for robotics application [report]
  • (L4DC 2021) Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning [paper] [code]
  • (arXiv 2021) Causal reinforcement learning using observational and interventional data [paper] [code]
  • (ICLR-W 2021) Resolving causal confusion in reinforcement learning via robust exploration [paper]
  • (ICLR-W 2021) Modelinvariant state abstractions for model-based reinforcement learning [paper]
  • (arxiv 2021) Causal reinforcement learning: An instrumental variable approach [paper]
  • (arxiv 2021) Instrumental variable value iteration for causal offline reinforcement learning [paper]
  • (arXiv 2021) Causaldyna: Improving generalization of dyna-style reinforcement learning via counterfactual-based data augmentation [paper]
  • (arXiv 2021) Causal imitative model for autonomous driving [paper] [code]

2020

  • (ICML 2020) Invariant causal prediction for block mdps [paper] [code]
  • (ICLR 2020) Causalworld: A robotic manipulation benchmark for causal structure and transfer learning [paper] [code]
  • (ICML 2020) Designing optimal dynamic treatment regimes: A causal reinforcement learning approach [paper]
  • (NeurIPS 2020) Fighting copycat agents in behavioral cloning from observation histories [paper] [code]
  • (NeurIPS 2020) Causal imitation learning with unobserved confounders [paper]
  • (NeurIPS 2020) Off-policy policy evaluation for sequential decisions under unobserved confounding [paper] [code]
  • (AAAI 2020) Off-Policy Evaluation in Partially Observable Environments [paper]
  • (AAAI 2020) Causal transfer for imitation learning and decision making under sensor-shift [paper]
  • (Master thesis 2020) Structural Causal Models for Reinforcement Learning [thesis]
  • (UAI 2020) Regret Analysis of Bandit Problems with Causal Background Knowledge [paper]
  • (IROS 2020) Learning transition models with time-delayed causal relations [paper]
  • (ICLR-W 2020) Resolving spurious correlations in causal models of environments via interventions [paper]
  • (NeurIPS-W 2020) Sample-efficient reinforcement learning via counterfactual-based data augmentation [paper]
  • (arXiv 2020) Causally correct partial models for reinforcement learning [paper]
  • (arXiv 2020) Causality and batch reinforcement learning: Complementary approaches to planning in unknown domains [paper]

2019

  • (Nature 2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning [paper]
  • (Science 2019) Human-level performance in 3d multiplayer games with population-based reinforcement learning [paper] [code]
  • (NeurIPS 2019) Causal confusion in imitation learning [paper] [code]
  • (NeurIPS 2019) Policy evaluation with latent confounders via optimal balance [paper] [code]
  • (NeurIPS 2019) Near-optimal reinforcement learning in dynamic treatment regimes [paper]
  • (ICML 2019) Counterfactual off-policy evaluation with gumbel-max structural causal models [paper] [code]
  • (ICML 2019) Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning [paper]
  • (ICLR 2019) Woulda, coulda, shoulda: Counterfactually-guided policy search [paper]
  • (AAAI 2019) Virtual-taobao: Virtualizing real-world online retail environment for reinforcement learning [paper] [code]
  • (AAAI 2019) Structural causal bandits with non-manipulable variables [paper]
  • (ICCV 2019) Exploring the limitations of behavior cloning for autonomous driving [paper] [code]
  • (KDD 2019) Environment reconstruction with hidden confounders for reinforcement learning based recommendation [paper]
  • (arXiv 2019) Causal reasoning from meta-reinforcement learning [paper]
  • (arXiv 2019) Learning causal state representations of partially observable environments [paper]
  • (arXiv 2019) Causal Induction from Visual Observations for Goal Directed Tasks [paper] [code]

2018

  • (MIT press 2018) Reinforcement learning: An introduction [book]
  • (Basic books 2018) The Book of Why: the new science of cause and effect [book]
  • (ICML 2018) Causal Bandits with Propagating Inference [paper]
  • (NeurIPS 2018) Structural Causal Bandits: Where to Intervene? [paper] [code]
  • (NeurIPS 2018) Confounding-robust policy improvement [paper] [code]
  • (AAAI 2018) Learning plannable representations with causal infogan [paper] [code]
  • (AAAI 2018) Counterfactual multi-agent policy gradients [paper] [code]
  • (AAAI 2018) Deep reinforcement learning that matters [paper] [code]
  • (IHSED 2018) Measuring collaborative emergent behavior in multi-agent reinforcement learning [paper]
  • (Foundations and Trends in Robotics 2018) An algorithmic perspective on imitation learning [paper]
  • (ICML-W 2018) Playing against nature: causal discovery for decision making under uncertainty [paper]
  • (arXiv 2018) Deconfounding reinforcement learning in observational settings [paper] [code]

2017

  • (MIT press 2017) Elements of causal inference: foundations and learning algorithms [book]
  • (PhD thesis 2017) Cognitive robotic imitation learning system based on cause-effect reasoning [thesis]
  • (TCDS 2017) A novel parsimonious cause-effect reasoning algorithm for robot imitation and plan recognition [paper] [code]
  • (ICML 2017) Neural Episodic Control [paper]
  • (ICML 2017) Schema networks: Zero-shot transfer with a generative causal model of intuitive physics [paper]
  • (ICML 2017) Counterfactual Data-Fusion for Online Reinforcement Learners [paper]
  • (ICML 2017) Identifying Best Interventions through Online Importance Sampling [paper]
  • (IJCAI 2017) Transfer learning in multi-armed bandit: a causal approach [paper]
  • (TACON 2017) Infinite time horizon maximum causal entropy inverse reinforcement learning [paper]
  • (CoRL 2017) CARLA: An Open Urban Driving Simulator [paper] [code]

Pre-2017

  • (NIPS 2016) Causal bandits: Learning good interventions via causal inference [paper] [code]
  • (ICAGI 2016) Imitation learning as cause-effect reasoning [paper] [code]
  • (Technical report 2016) Markov decision processes with unobserved confounders: A causal approach [paper]
  • (NIPS 2015) Bandits with unobserved confounders: A causal approach [paper] [code]

👏 Contributions [chinese version]

1. Fork the Repository: Click on the Fork button in the top-right corner to create a copy of the repository in your GitHub account.

2. Create a New Branch: In your forked repository, create a new branch (e.g., "libo") by using the branch selector button near the top-left (usually labeled master or main).

3. Make Your Changes: Switch to your new branch using the same selector. Then, click the Edit file button at the top right and make your changes. Add entries in the following format:

  - (**publisher_name year**) manuscript_name [[publication_type](online_manuscript_link)] [[code](online_code_link)]

4. Commit Changes: Save your changes by clicking the Commit changes button in the upper-right corner. Enter a commit message (e.g., "add 1 cvpr'24 paper") and an extended description if necessary, then confirm your changes by clicking the Commit changes button again at the bottom right.

5. Create a Pull Request: Go back to your forked repository and click Compare & pull request. Alternatively, select your branch from the branch selector and click Open pull request from the Contribute drop-down menu. Fill out the title and description for your pull request, and click Create pull request to submit it.

About

[arXiv-2023.3, TNNLS-2024] Official repository of "A Survey on Causal Reinforcement Learning"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published