-
Equation of State Calculations by Fast Computing Machines (1953). The proud history of Markov chain Monte Carlo (MCMC) methods begins with this paper on molecular dynamics by Nicholas Metropolis and colaborators.
-
Monte Carlo sampling methods using Markov chains and their applications (1970). In this paper, Wilfred Hastings introduces the "Metropolis" algorithm to a general statistical audience, generalising it to general proposal distributions in the process.
-
Maximum Likelihood from Incomplete Data via the EM Algorithm (1977). In this seminal paper, Dempster, Laird and Rubin introduce the Expectation-Maximisation (EM) algorithm to obtain maximum likelihood estimates in models with latent variables and other missing data situations.
-
Optimization by Simulated Annealing (1983). Scott Kirkpatrick and colaborators are one of the many groups credited with the invention of the simulated annealing (SA) method. This paper showcases the application of SA to the famous(ly) NP-hard travelling salesman problem.
-
Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images (1984). The famous Gibbs sampler was conceived by the Geman brothers in a situation where sampling from the whole target was a nightmare, but sampling from each conditional distribution was relatively easy. One of the working horses of computational statistics to this day.
-
The Calculation of Posterior Distributions by Data Augmentation (1987). In this paper, Martin Tanner and Wing Wong show how to make a problem easier by making it bigger, i.e., by augmenting the state space a la EM algorithm.
-
Markov chains for exploring posterior distributions (1994) . In this landmark paper, Luke Tierney gives a great account of the theoretical underpinnings of MCMC theory.
-
Reversible jump Markov chain Monte Carlo computation and Bayesian model determination by the great Peter Green is simply one of the best Computational Statistics papers of all time. It introduces the so-called Reversible Jump MCMC algorithm which allows sampling over the space of models and thus perform model choice at the same time as model fitting.
-
Slice sampling (2003). In this seminal paper, Radford Neal introduces the slice sampler, a simple algorithm that often outpeforms Metropolis-Hastings-type samplers.
-
General state space Markov chains and MCMC algorithms (2004) by Gareth Roberts and Jeff Rosenthal, brings results about Markov chains in uncountable state spaces and general conditions for geometric ergodicity are given. A complete treatise.
-
A generalized Markov sampler (2004) by Jon Keith, Dirk Kroese and Darryn Bryant shows how one can view the work of Green (1995) (see above) as a special case of a more general sampler.
-
MCMC using Hamiltonian dynamics (2011). Yet another masterpiece by Radford Neal, this paper provides a thorough review of the history and main concepts involving HMC.
-
The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo (2014). Matt Hoffmann and Andrew Gelman introduce a novel algorithm that tunes the step size and tree depth of the HMC algorithm automatically. The No-U-Turn Sampler (NUTS) as it came to christened, is the building block for what would later for the main algorithm implemented in Stan.
-
In A tutorial on adaptive MCMC, Cristophe Andrieu and Johannes Thoms give a very nice overview of the advantages and pitfalls (!) of adaptive MCMC. Pay special heed to Section 2.