Term Project conducted in winter semester 2020/21 in the course "Projects in Machine Learning and Artificial Intelligence" at Technical University Berlin, Institute of Software Engineering and Theoretical Computer Science, Research Group Methods in Artifical Intelligence (lead by Prof. Dr. Manfred Opper).
The method we analysed was Stein Variational Gradient Descent, a non-parametric variational inference method, that iteratively applies smooth transformations on a set of initial particles to construct a transport map wich moves the initial particles such that they closely approximate (in terms of KL divergence) an otherwise intractable target distribution.
We reimplemented the codebase and rerun the experiments the authors have documented in their original repository and further performed additional experiments that showcase characteristics and limitations of the SVGD.
using Distributions
using ForwardDiff
using KernelFunctions
function SVGD(x, scorefun, kernel, grad_kernel, n_epochs, ϵ)
n_particles = size(x, 1) # number of particles
for i in 1:n_epochs
scorefun_X = scorefun(x) # ∇ log p(x)
kxy = kernel(x) # k(⋅, x)
dxkxy = grad_kernel(x) # ∇k(⋅, x)
ϕ = ((kxy * dlogpdf_val) .+ dxkxy) ./ n_particles # steepest descent
x += ϵ .* ϕ # updating the particles
end
return x
end
We used different languages (Python 3.6; Julia 1.5) and following libraries throughout our experiments. Please install them to rerun our experiments:
Python:
- PyTorch
- Numpy
- Scikit-Learn
- Matplotlib
- Scipy
Julia:
- Statistics
- Distances
- Random
- ForwardDiff
- LinearAlgebra
- Distributions
- Plots
- KernelFunctions
- EvalMetrics
- MLDataUtils
- presentation_slides (mid-term and final presentation pdfs)
- src (code of all experiments)
- data (experiment datasets from original repo)
- python
- bayesian_neural_network (pytorch implementation of SVGD, applied on boston housing dataset)
- multi_variate_normal (Jupyter Notebook that applies numpy implementation on 2D gaussian example)
- SVGD.py (numpy implementation of SVGD)
- julia
- bayesian_regression (bayesian regression experiments, applying Julia implementation on covertype dataset)
- gaussian_mixture_anealing (gaussian mixture models experiemnts, applyuing Julia implementation on mixtures of 2D gaussians, partially using annealing SVGD)
- multi_variate_normal (Jupyter Notebook that applies Julia implementation on 2D gaussian example)
- SVGD.jl (julia implementation of SVGD)
- statics (training artifacts and generated graphics)
- Saeed Salehi, @ssnio
- Boshu Zhang, @Bookiebookie
- Clemens Dieffendahl, @dissendahl
- Qiang Liu and Dilin Wang Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm (Paper), NIPS, 2016
- dilinwang820/Stein-Variational-Gradient-Descent Stein Variational Gradient Descent (SVGD) (Code Repository), github.com, 2016
- Bhairav Mehta Depth First Learning - Stein Variational Gradient Descent Class, depthfirstlearning.com, 2020
- Blei et al. Variational Inference: Foundations and Modern Methods, NIPS, 2016
- Francesco D'Angelo and Vincent Fortuin Annealed Stein Variational Gradient Descent, 3rd Symposium on Advances in Approximate Bayesian Inference, 2020
- Gianluca Detommaso et al. A Stein variational Newton method, arxiv.org, 2018
- José Miguel Hernández-Lobato et al. Probabalistic Backpropagation for Scalable Learning of Bayesian Neural Networks, ICML, 2015
- Joseph Rooca Bayesian inference problem, MCMC and variational inference, Towards Data Science Blog, 2019
- Qiang Liu et al. *A Kernelized Stein Discrepancy for Goodness-of-Fit Tests, arxiv.org, 2016
- Qiang Liu Stein Variational Gradient Descent as Gradient Flow, arxiv.org, 2017
- Samuel J. Gershman et al. Nonparametric Variational Inference, ICML, 2012
- Yang Liu et al. Stein Variational Policy Gradient, arxiv.org, 2017
- Radford M. Neal, Slice Sampling, Annals of Statistics 31 (3): 705-67, 2003