RL-Exploration-Paper-Lists

Paper Collection of Reinforcement Learning Exploration covers Exploration of Muti-Arm-Bandit, Reinforcement Learning and Multi-agent Reinforcement Learning.

Exploration Reinforcement Learning is an important topic in Reinforcement Learning research area, which is to essentially improve the sample efficiency in a MDP setting. Naive survey Slides and Document(Chinese) of Exploration problem are available.

A simple form of Exploration-exploitation dilemma can be seen from the Multi-Arm Bandit problems, and we include MAB papers because many theoretical idea can be drived from MAB studies.

In the early stage, most Exploration study focus on the sample efficiency of specific algorithm, many of them design different Exploration bonus to lead agents explore sufficient trajectories.

Recently, most DRL Exploration researches are focus on sparse reward settings and the target is a little different with the former studies, however we still classify those methods based on their methodology.

Many learning algorithm also consider the problem of efficient Exploration so that we also contain such work.

Exploration is a big topic so this paper list is just a scratch. Collected papers are sorted by time and classification. Any suggestions and pull requests are welcome.

The sharing principle of these references here is for research. If any authors do not want their paper to be listed here, please feel free to contact me (Email: ericliuof97 [AT] gmail.com).

Overview

MAB Exploration
RL Exploration
MARL Exploration
- Coordinated Exploration

MAB Exploration

MAB Review Papers

<Bandit problems: sequential allocation of experiments (monographs on statistics and applied probability)> by Donald A Berry and Bert Fristedt, 1985.

Decaying Parameter

<A contextual-bandit algorithm for mobile context-aware recommender system> by Djallel Bouneffouf, Amel Bouzeghoub, and Alda Lopes Gançarski, 2012.
<Value-difference based Exploration: adaptive control between epsilon-greedy and softmax> by Michel Tokic and Günther Palm, 2011.
<Adaptive ε-greedy Exploration in reinforcement learning based on value differences> by Michel Tokic, 2010.
<Finite-time regret bounds for the multiarmed bandit problem> by Nicolo Cesa-Bianchi and Paul Fischer, 1998.

Provable Algorithms

[POKER] <Multi-armed bandit algorithms and empirical evaluation> by Joannes Vermorel and Mehryar Mohri, 2005
[UCB] <Finite-time analysis of the multiarmed bandit problem> by Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer, 2002.
[Pursuit] <A class of rapidly converging algorithms for learning automata> by MAL Thathachar, 1984.

Beyesian Bandit

[Beyesian UCB] <On bayesian up- per confidence bounds for bandit problems> by Emilie Kaufmann, Olivier Cappé, and Aurélien Garivier, 2012.
<A modern bayesian look at the multi-armed bandit> by Steven L Scott, 2010.

Gittin Indices

<Optimal learning and experimentation in bandit problems> by Monica Brezzi and Tze Leung Lai, 2002.
<Bandit processes and dynamic allocation indices> by John C Gittins, 1979.

Comparative Experiments

[Interesting] <Algorithms for multi-armed bandit problems> by Kuleshov, Volodymyr and Precup, Doina, 2014.

RL Exploration

Reward based Exploration (Intrinsic Reward \ Exploration Bonus \ Surprise \ Curiosity \ Uncertainty)

Intrinsic Reward Review Papers

<Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990-2010)> by Jürgen Schmidhuber, 2010.
<What is intrinsic motivation? A typology of computational approaches> by Pierre-Yves Oudeyer and Frederic Kaplan, 2007.

Counts based Methods

PAC-MDP

[MBIE, MBIE-EB] <A theoretical analysis of model-based interval estimation> by Alexander L Strehl and Michael L Littman, 2004.
[MBIE] <An empirical evaluation of interval estimation for markov decision processes> by Alexander L Strehl and Michael L Littman, 2004.
[R-max] <R-max-a general polynomial time algorithm for near-optimal reinforcement learning> by Ronen I Brafman and Moshe Tennenholtz, 2002.
[E3] <Near-optimal reinforcement learning in polynomial time> by Michael Kearns and Satinder Singh, 2002.
[MBIE] <Efficient model-based Exploration> by Marco Wiering and Jürgen Schmidhuber, 1998.

Beyesian Reinforcement Learning

<Near-bayesian Exploration in polynomial time> by J Zico Kolter and Andrew Y Ng, 2009.

Psudo Count

<Unifying count-based Exploration and intrinsic motivation> by Marc Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Remi Munos, 2016.
<Count-based Exploration with neural density models>by Georg Ostrovski, Marc G Bellemare, Aaron van den Oord, and Rémi Munos, 2018

State Representation

[CoEX] <Contingency-aware Exploration in reinforcement learning> by Choi, Jongwook and Guo, Yijie and Moczulski, Marcin and Oh, Junhyuk and Wu, Neal and Norouzi, Mohammad and Lee, Honglak, 2018.
<# Exploration: A study of count-based Exploration for deep reinforcement learning> by Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel, 2017.

Information Theory based Methods

Mutual Information

<Variational information maximisation for intrinsically motivated reinforcement learning> by Shakir Mohamed and Danilo Jimenez Rezende, 2015.
<An information-theoretic approach to curiosity-driven reinforcement learning> by Susanne Still and Doina Precup, 2012.

Information Gain

<Vime: Variational information maximizing Exploration> by Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, and Pieter Abbeel, 2016.

Prediction / Prediction error based Methods

<Self-Supervised Exploration via Disagreement> by Deepak Pathak, Dhiraj Gandhi and Abhinav Gupta, 2019.
<Episodic curiosity through reachability> by Savinov, Nikolay and Raichuk, Anton and Marinier, Raphael and Vincent, Damien and Pollefeys, Marc and Lillicrap, Timothy and Gelly, Sylvain, 2018.
[RND] <Exploration by random network distillation> by Burda, Yuri and Edwards, Harrison and Storkey, Amos and Klimov, Oleg, 2018.
<Large-Scale Study of Curiosity-Driven Learning> by Burda, Yuri and Edwards, Harri and Pathak, Deepak and Storkey, Amos and Darrell, Trevor and Efros, Alexei A, 2018.
[ICM] <Curiosity-driven Exploration by self-supervised prediction> by Pathak, Deepak and Agrawal, Pulkit and Efros, Alexei A and Darrell, Trevor, 2017.

Policy based Exploration

<Provably efficient RL with Rich Observations via Latent State Decoding> by Du, Simon S and Krishnamurthy, Akshay and Jiang, Nan and Agarwal, Alekh and Dudik, Miroslav and Langford, John, 2019.
<Parameter Space Noise for Exploration> by Plappert, Matthias and Houthooft, Rein and Dhariwal, Prafulla and Sidor, Szymon and Chen, Richard Y and Chen, Xi and Asfour, Tamim and Abbeel, Pieter and Andrychowicz, Marcin, 2018.
[SoftAC] <Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor> by Haarnoja, Tuomas and Zhou, Aurick and Abbeel, Pieter and Levine, Sergey, 2018.
[SoftQ] <Reinforcement learning with deep energy-based policies> by Haarnoja, Tuomas and Tang, Haoran and Abbeel, Pieter and Levine, Sergey, 2017.
<Deep Exploration via bootstrapped DQN> by Osband, Ian and Blundell, Charles and Pritzel, Alexander and Van Roy, Benjamin, 2016.
<Taming the noise in reinforcement learning via soft updates> by Fox, Roy and Pakman, Ari and Tishby, Naftali, 2015.

Search Exploration

[UCT] <Monte-Carlo Exploration for deterministic planning> by Hootan Nakhost and Martin Müller, 2009
<Bandit based monte-carlo planning> by Levente Kocsis and Csaba Szepesvári, 2006

Others

[Go-Explore] <Go-Explore: a New Approach for Hard-Exploration Problems> by Ecoffet, Adrien and Huizinga, Joost and Lehman, Joel and Stanley, Kenneth O and Clune, Jeff, 2019.

MARL Exploration

<CLEANing the reward: counterfactual actions to remove exploratory action noise in multiagent learning> by HolmesParker C, Taylor M E, Agogino A, et al., 2014.
<Classes of multiagent q-learning dynamics with epsilon-greedy Exploration> by Wunder M, Littman M L, Babes M., 2010.

Coordinated Exploration

<Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning> by Shariq Iqbal, Fei Sha, 2019.
<Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits> by Mithun Chakraborty, Kai Yee Phoebe Chua, Sanmay Das, Brendan Juba, 2017.
<Multi-Robot Coordination for Space Exploration> by Logan Yliniemi and Adrian Agogino and Kagan Tumer, 2013.
<Coordinated Multi-Agent Exploration> by Abraham Sanchez L. and Alfredo Toriz P., 2008.
<Coordinated exploration in multi-agent reinforcement learning: an application to load-balancing> by Katja Verbeeck, Ann Nowe and Karl Tuyls, 2005.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

RL-Exploration-Paper-Lists

Overview

MAB Exploration

MAB Review Papers

Decaying Parameter

Provable Algorithms

Beyesian Bandit

Gittin Indices

Comparative Experiments

RL Exploration

Reward based Exploration (Intrinsic Reward \ Exploration Bonus \ Surprise \ Curiosity \ Uncertainty)

Intrinsic Reward Review Papers

Counts based Methods

PAC-MDP

Beyesian Reinforcement Learning

Psudo Count

State Representation

Information Theory based Methods

Mutual Information

Information Gain

Prediction / Prediction error based Methods

Policy based Exploration

Search Exploration

Others

MARL Exploration

Coordinated Exploration

Files

README.md

Latest commit

History

README.md

File metadata and controls

RL-Exploration-Paper-Lists

Overview

MAB Exploration

MAB Review Papers

Decaying Parameter

Provable Algorithms

Beyesian Bandit

Gittin Indices

Comparative Experiments

RL Exploration

Reward based Exploration (Intrinsic Reward \ Exploration Bonus \ Surprise \ Curiosity \ Uncertainty)

Intrinsic Reward Review Papers

Counts based Methods

PAC-MDP

Beyesian Reinforcement Learning

Psudo Count

State Representation

Information Theory based Methods

Mutual Information

Information Gain

Prediction / Prediction error based Methods

Policy based Exploration

Search Exploration

Others

MARL Exploration

Coordinated Exploration