Available algorithms

: thoroughly-tested. In many cases, we verified against known values and/or reproduced results from papers.

~: implemented but lightly tested.

X: known problems; please see github issues.

Algorithms	Category	Reference	Status
Information Set Monte Carlo Tree Search (IS-MCTS)	Search	Cowley et al. '12	~
Minimax (and Alpha-Beta) Search	Search	Wikipedia1, Wikipedia2, Knuth and Moore '75
Monte Carlo Tree Search	Search	Wikipedia, UCT paper, Coulom '06, Cowling et al. survey
Lemke-Howson (via `nashpy`)	Opt.	Wikipedia, Shoham & Leyton-Brown '09
ADIDAS	Opt.	Gemp et al '22	~
Sequence-form linear programming	Opt.	Koller, Megiddo, and von Stengel '94, Shoham & Leyton-Brown '09
Stackelberg equilibrium solver	Opt.	Conitzer & Sandholm '06	~
Magnetic Mirror Descent (MMD) with dilated entropy	Opt.	Sokota et al. '22	~
Counterfactual Regret Minimization (CFR)	Tabular	Zinkevich et al '08, Neller & Lanctot '13
CFR against a best responder (CFR-BR)	Tabular	Johanson et al '12
Exploitability / Best response	Tabular	Shoham & Leyton-Brown '09
External sampling Monte Carlo CFR	Tabular	Lanctot et al. '09, Lanctot '13
Fixed Strategy Iteration CFR (FSICFR)	Tabular	Neller & Hnath '11	~
Mean-field Ficticious Play for MFG	Tabular	Perrin et. al. '20	~
Online Mirror Descent for MFG	Tabular	Perolat et. al. '21	~
Munchausen Online Mirror Descent for MFG	Tabular	Lauriere et. al. '22	~
Fixed Point for MFG	Tabular	Huang et. al. '06	~
Boltzmann Policy Iteration for MFG	Tabular	Lauriere et. al. '22	~
Outcome sampling Monte Carlo CFR	Tabular	Lanctot et al. '09, Lanctot '13
Policy Iteration	Tabular	Sutton & Barto '18
Q-learning	Tabular	Sutton & Barto '18
Regret Matching	Tabular	Hart & Mas-Colell '00
Restricted Nash Response (RNR)	Tabular	Johanson et al '08	~
SARSA	Tabular	Sutton & Barto '18
Value Iteration	Tabular	Sutton & Barto '18
Advantage Actor-Critic (A2C)	RL	Mnih et al. '16
Deep Q-networks (DQN)	RL	Mnih et al. '15
Ephemeral Value Adjustments (EVA)	RL	Hansen et al. '18	~
Proximal Policy Optimization (PPO)	RL	Schulman et al. '18	~
AlphaZero (C++/LibTorch)	MARL	Silver et al. '18
AlphaZero (Python/TF)	MARL	Silver et al. '18
Correlated Q-Learning	MARL	Greenwald & Hall '03	~
Asymmetric Q-Learning	MARL	Kononen '04	~
Deep CFR	MARL	Brown et al. '18
Exploitability Descent (ED)	MARL	Lockhart et al. '19
(Extensive-form) Fictitious Play (XFP)	MARL	Heinrich, Lanctot, & Silver '15
Nash Q-Learning	MARL	Hu & Wellman '03	~
Neural Fictitious Self-Play (NFSP)	MARL	Heinrich & Silver '16
Neural Replicator Dynamics (NeuRD)	MARL	Omidshafiei, Hennes, Morrill, et al. '19	X
Regret Policy Gradients (RPG, RMPG)	MARL	Srinivasan, Lanctot, et al. '18
Policy-Space Response Oracles (PSRO)	MARL	Lanctot et al. '17
Q-based ("all-actions") Policy Gradient (QPG)	MARL	Srinivasan, Lanctot, et al. '18
Regularized Nash Dynamics (R-NaD)	MARL	Perolat, De Vylder, et al. '22
Regression CFR (RCFR)	MARL	Waugh et al. '15, Morrill '16
Rectified Nash Response (PSRO_rn)	MARL	Balduzzi et al. '19	~
Win-or-Learn-Fast Policy-Hill Climbing (WoLF-PHC)	MARL	Bowling & Veloso '02	~
α-Rank	Eval. / Viz.	Omidhsafiei et al. '19, arXiv
Nash Averaging	Eval. / Viz.	Balduzzi et al. '18	~
Replicator / Evolutionary Dynamics	Eval. / Viz.	Hofbaeur & Sigmund '98, Sandholm '10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

algorithms.md

algorithms.md

Available algorithms

Files

algorithms.md

Latest commit

History

algorithms.md

File metadata and controls

Available algorithms