Here, detailed usage instructions are provided, but for specific APIs, please consult the documentation (built with -D BUILD_DOCUMENTATION=TRUE
).
In your CMakeLists.txt
:
find_package(SVGDCpp REQUIRED)
add_executable(app app.cpp)
target_link_libraries(app
PRIVATE
SVGDCpp::SVGDCpp
)
Then in the source code:
#include <SVGDCpp/Core>
#include <SVGDCpp/Model>
#include <SVGDCpp/Kernel>
#include <SVGDCpp/Optimizer>
Mainly, there are 5 elements required to run a minimally working SVGD algorithm:
- Particles' coordinates (in a
$m \times n$ matrix for$n$ particles in a$m$ -dimensional problem) -
Model
object; -
Kernel
object; -
Optimizer
object; -
SVGD
object.
The first four are required by the fifth as shared pointer arguments. The Model
and Kernel
classes are used to compute Optimizer
class.
This example uses the provided classes MultivariateNormal
, GaussianRBF
, and Adam
optimizer to estimate a multivariate normal distribution.
// Set up a 2-D multivariate normal problem with an RBF kernel using 10 particles and the Adam optimizer for 1000 iterations
size_t dim = 2, num_particles = 10, num_iterations = 1000;
auto x0 = std::make_shared<Eigen::MatrixXd>(3*Eigen::MatrixXd::Random(dim, num_particles));
// Create a multivariate normal model pointer
Eigen::Vector2d mean(5, -5);
Eigen::Matrix2d covariance;
covariance << 0.5, 0, 0, 0.5;
std::shared_ptr<Model> model_ptr = std::make_shared<MultivariateNormal>(mean, covariance);
// Create RBF kernel pointer
std::shared_ptr<Kernel> kernel_ptr = std::make_shared<GaussianRBFKernel>(x0, GaussianRBFKernel::ScaleMethod::Median, model_ptr);
// Create Adam optimizer pointer
std::shared_ptr<Optimizer> opt_ptr = std::make_shared<Adam>(dim, num_particles, 1.0e-1, 0.9, 0.999);
Then, instantiate the SVGD
class by passing the created objects. You can either instantiate by passing the created objects or by passing the SVGDOptions
struct.
// Instantiate the SVGD class using regular arguments
SVGD svgd(dim, num_iterations, x0, kernel_ptr, model_ptr, opt_ptr);
/* OR */
// Instantiate the SVGD class using regular arguments, with lower and upper bounds
SVGD svgd(dim, num_iterations, x0, kernel_ptr, model_ptr, opt_ptr, Eigen::Vector2d(-2, -7), Eigen::Vector2d(5, 3));
/* OR */
// Instantiate the SVGD class using the SVGDOptions struct
SVGDOptions options;
options.Dimension = dim;
options.NumIterations = num_iterations;
options.CoordinateMatrixPtr = x0;
options.ModelPtr = model_ptr;
options.KernelPtr = kernel_ptr;
options.OptimizerPtr = opt_ptr;
options.LowerBound = Eigen::Vector2d(-2, -7);
options.UpperBound = Eigen::Vector2d(5, 3)
options.Parallel = false;
options.IntermediateMatricesOutputPath = "mylog.txt";
options.LogIntermediateMatrices = false;
SVGD svgd(options);
Finally, initialize and run.
// x0 has initial particle coordinates at this point
// Initialize and run SVGD
svgd.Initialize();
svgd.Run();
// x0 is now updated at this point
At the moment, this library provides the following classes:
- Models:
MultivariateNormal
- Kernels:
GaussianRBF
- Optimizers:
AdaGrad
RMSProp
Adam
More may be provided in the future, but not guaranteed.
Most likely, you'll want to estimate your own density functions and/or use other types of kernel functions. There are in general 3 methods to provide them using this library, of which the first 2 are preferable over the third.
Before you proceed, here are some important things to note about the model and kernel classes that you create.
-
Model: The model MUST be a non-negative scalar function within the bounds of interest (negative values will cause the program to fail when
EvaluateLogModel
is used). The model function is essentially a probability density function to be estimated. -
Kernel: By definition, a kernel function must be specified such that the matrix generated by the said function must be positive definite (see arXiV: Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm and Wikipedia: RKHS). The kernel function
$k(x, x^{'})$ takes in 2 inputs to compute a 'similarity' score between the 2 inputs. In this library,$x$ is the function variable (that undergoes differentiation) while$x^{'}$ is referred to as the location.
In a nutshell, you create function objects as desired that are used to update either the base (method 1) or derived (method 2) classes of Model
and/or Kernel
. The function objects have a fixed signature that you must follow:
- Model function:
std::function<double(const VectorXADd &, const std::vector<MatrixXADd> &)>
with inputs of (1) function variable, (2) function parameters. - Kernel function:
std::function<double(const VectorXADd &, const std::vector<MatrixXADd> &, const VectorXADd &)>
with inputs of (1) function variable, (2) function parameters, (3) kernel location.
⚠️ A common error that you may encounter when your model and/or kernel functions are invalid (e.g., log of a negative number, division by zero, etc.) is the following:cppad-20240602 error from a known source: yq = f.Forward(q, xq): a zero order Taylor coefficient is nan. Corresponding independent variables vector was written to binary a file.
ℹ️
VectorXADd
andMatrixXADd
are aliases forEigen::VectorXd<CppAD::AD<double>>
andEigen::MatrixXd<CppAD::AD<double>>
respectively. TheCppAD::AD<double>
type is adouble
type that is used by the CppAD library to perform automatic differentiation. To convert between thedouble
andCppAD::AD<double>
types you can useConvertToCppAD
andConvertFromCppAD
.
This is the simplest method and is useful if you don't need specific variables/functionalities in your Model
and/or Kernel
classes.
In this method, you instantiate a Model
/Kernel
class and update the Model
/Kernel
functions by calling UpdateModel
/UpdateKernel
with your desired function object as the argument. The example below also shows how you can compose new models from existing ones.
// Create the 1st model function as a exp(-(x-mean)^T * cov^{-1} * (x-mean)) where params = {mean, cov} = {nx1 vector, nxn matrix}
std::function<double(const VectorXADd &, const std::vector<MatrixXADd> &)> model_fun_1 =
[](const VectorXADd &x, const std::vector<MatrixXADd> ¶ms)
{
VectorXADd result(1);
VectorXADd diff = x - params[0];
result << (- (diff.transpose() * param[1].inverse() * diff)).array().exp();
return result;
};
// Define the kernel function as a exp(x.sum()) without parameters
std::function<double(const VectorXADd &, const std::vector<MatrixXADd> &, const VectorXADd &)> kernel_fun =
[](const VectorXADd &x, const std::vector<MatrixXADd> ¶ms, const VectorXADd &location)
{
VectorXADd result(1);
result << CppAD::exp(x.sum());
return result;
};
// Use the defined function object to create the 1st model
Model model1 = Model(2);
model1.UpdateModel(model_fun);
// Pass a lambda function directly to create the 2nd model
Model model2 = Model(2);
model2.UpdateModel([](const VectorXADd &x)
{
VectorXADd result(1);
result << CppAD::cos(x.sum()) + 2.0;
return result;
});
// Create a composed model whose model function is the sum of the two other model functions
Model sum_model = model1 + model2;
// Create a composed model whose model function is the difference of the two other model functions
Model difference_model = model1 - model2;
// Create a composed model whose model function is the product of the two other model functions
Model product_model = model1 * model2;
// Create a composed model whose model function is the quotient of the two other model functions
Model quotient_model = model1 / model2;
This is useful if you want to write your own derived class to provide specific functionality.
The example below is extracted from the MultivariateNormal
class. The model function should be provided in the constructor and "incorporated" into the class with the UpdateModel
(or UpdateKernel
if you're writing a new kernel) method.
If you wish to also run SVGD in parallel mode, ensure that the Clone*
functions are provided for polymorphic copying which is required for multithreaded execution of SVGD. Otherwise, they are not necessary.
class MultivariateNormal : public Model
{
public:
MultivariateNormal() {}
MultivariateNormal(const Eigen::VectorXd &mean, const Eigen::MatrixXd &covariance) : Model(mean.rows())
{
// Ensure that the dimensions of mean matches covariance
if (!CompareVectorSizes<Eigen::VectorXd, Eigen::VectorXd>(mean, covariance.col(0)) ||
!CompareVectorSizes<Eigen::VectorXd, Eigen::VectorXd>(mean, covariance.row(0)))
{
throw DimensionMismatchException("Dimensions of parameter vectors/matrices do not match.");
}
// Store parameters
model_parameters_.push_back(ConvertToCppAD(mean));
model_parameters_.push_back(ConvertToCppAD(covariance));
// Compute the normalization constant based on the updated parameters
ComputeNormalizationConstant();
// Define model function (the kernel density only, without normalization constant)
auto model_fun = [](const VectorXADd &x, const std::vector<MatrixXADd> ¶ms)
{
VectorXADd result(1), diff = x - params[0];
result << (-0.5 * (diff.transpose() * params[1].inverse() * diff).array()).exp();
return result;
};
UpdateModel(model_fun);
}
// Provide polymorphic copying functionality (required only for parallel mode)
virtual std::unique_ptr<Model> CloneUniquePointer() const
{
return std::make_unique<MultivariateNormal>(*this);
}
// Provide polymorphic copying functionality (required only for parallel mode)
virtual std::shared_ptr<Model> CloneSharedPointer() const
{
return std::make_shared<MultivariateNormal>(*this);
}
/* and so on */
};
This may be useful as it skips the need to do automatic differentiation, which may improve performance. The reason you would choose this method is if you can provide closed-form functions for the gradients (Jacobians) for both the model function and its logarithmic transformation. However, you won't be able to do functional composition (e.g., obj1 + obj2
, obj1*obj2
, etc.) because composition works with the Cpp::ADFun
object in the base class; instead you directly implement the closed-form functions without touching the Cpp::ADFun
objects.
In this method, you override the Evaluate*
functions (provided by the base classes); these are the functions used by the SVGD
class to "move" the particles. You can use parameters in your model/kernel function through the model_parameters_
/kernel_parameters_
member variable, though you'll need to convert it with ConvertFromCppAD
as its elements are of MatrixXADd
type.
class MyModel : public Model
{
public:
// Provide polymorphic copying functionality (required only for parallel mode)
virtual std::unique_ptr<Model> CloneUniquePointer() const
{
return std::make_unique<MyModel>(*this);
}
// Provide polymorphic copying functionality (required only for parallel mode)
virtual std::shared_ptr<Model> CloneSharedPointer() const
{
return std::make_shared<MyModel>(*this);
}
// Override the method to do nothing (by default it re-records the automatic differentiation tape used by CppAD)
void Initialize() override {}
// Override the method to compute the model function directly
double EvaluateModel(const Eigen::VectorXd &x) override
{
std::vector<Eigen::MatrixXd> converted_params(model_parameters_.size());
for (size_t i = 0; i < model_parameters_.size(); ++i)
{
// Convert from CppAD::AD<double> type to double type
converted_params[i] = ConvertFromCppAD(model_parameters_[i]);
}
return converted_params[0] * std::cos(x(0)) + converted_params[1] * std::cos(x(1)) + converted_params[2] * x.prod() + converted_params[3];
}
// Override the method to compute the log model function directly
double EvaluateLogModel(const Eigen::VectorXd &x) override
{
return std::log(EvaluateModel(x));
}
// Override the method to compute the model Jacobian directly
Eigen::VectorXd EvaluateModelGrad(const Eigen::VectorXd &x) override
{
std::vector<Eigen::MatrixXd> converted_params(model_parameters_.size());
for (size_t i = 0; i < model_parameters_.size(); ++i)
{
// Convert from CppAD::AD<double> type to double type
converted_params[i] = ConvertFromCppAD(model_parameters_[i]);
}
Eigen::Vector2d result;
result << -converted_params[0] * std::sin(x(0)) + converted_params[2] * x(1), -converted_params[1] * std::sin(x(1)) + converted_params[2] * x(0);
return result;
}
// Override the method to compute the log model Jacobian directly
Eigen::VectorXd EvaluateLogModelGrad(const Eigen::VectorXd &x) override
{
return EvaluateModelGrad(x) / EvaluateModel(x);
}
/* and so on */
};
In some cases, your derived class may need to provide a Hessian implementation. For example, if you use the provided GaussianRBF
kernel and would like to compute the scale using the ScaleMethod::Hessian
method, then the Hessian of your log model is required.
When defining functions for your Model
or Kernel
classes, you need to work with vectors and matrices of CppAD::AD<double>
type. The CppAD library provides a list of common math functions for you to use in place of the regular functions. A short list is provided below; see the CppAD website for the entire list.
Instead of | Use |
---|---|
std::exp |
CppAD::exp |
std::abs |
CppAD::abs |
std::pow |
CppAD::pow |
std::sqrt |
CppAD::sqrt |
Whenever a Model
or a Kernel
object is defined for the first time, they're uninitialized, so you'll need to initialize them if you want to use them directly. This applies even if you're copying from an existing object.
// Create RBF kernel pointer
std::shared_ptr<Kernel> kernel_ptr = std::make_shared<GaussianRBFKernel>(x0, GaussianRBFKernel::ScaleMethod::Median, model_ptr);
// Compute kernel value with arbitrary values and at the default location (i.e., reference coordinate)
kernel_ptr->Initialize(); // initialize the kernel so that the `Evaluate*` functions are available
Eigen::Vector2d input(16.328, -4.059);
double output = kernel_ptr->EvaluateKernel(input);
// Compute kernel value with arbitrary values and at an arbitrary location
Eigen::Vector2d new_location(5.2225, 97.6);
kernel_ptr->UpdateLocation(new_location);
kernel_ptr->Initialize();
output = kernel_ptr->EvaluateKernel(input);
Anytime an Update*
method is called manually, the instance's Initialize()
function must be called before the scope ends. This is so that the CppAD
tape can record any modifications to the functions.
// Okay
{
Model m;
m.UpdateModel(some_fun);
m.Initialize();
}
// Also okay (order of Update* doesn't matter here)
{
Model m;
m.UpdateParameters(some_params);
m.UpdateModel(some_fun);
m.Initialize();
}
// Okay, but redundant
{
Model m;
m.UpdateParameters(some_params);
m.Initialize();
m.UpdateModel(some_fun);
m.Initialize();
}
// Applies also to Kernel classes (order doesn't matter here, as long as `Initialize()` is called at the end)
{
Kernel k;
k.UpdateKernel(some_fun);
k.UpdateLocation(some_vec);
k.UpdateParameters(some_params);
k.Initialize();
k.UpdateParameters(some_new_params); // doesn't get registered; subsequent computation uses the `some_params` values
}
// If you defined models/kernels to be composed into a new model (and don't intend to use the original objects), then `Initialize()` is only needed for the new object
{
Model m_a(2), m_b(2);
m_a.UpdateModel(model_function_a);
m_b.UpdateModel(model_function_b);
Model m_new(2);
m_new = m_a + m_b;
m_new.Initialize();
// same for copied objects
Model m_another(m_new);
m_another.Initialize(); // required
}
Typically, unless you call the Update*
methods directly, you don't need to worry about calling Initialize()
yourself. In most cases, you interact with the SVGD
object, and its UpdateModelParameters
and UpdateKernelParameters
functions take care of re-initializing the updates internally.
Running SVGD in parallel mode is as simple as calling SetupForParallelMode()
at the beginning of your code and then instantiating the SVGD
class with the parallel
argument as true
.
// Set up parallel mode execution
SetupForParallelMode();
/* Definitions of dim, num_iterations, x0m, model_ptr, kernel_ptr, opt_ptr here */
// Instantiate the SVGD class with the `parallel` argument as true
SVGD svgd(dim, num_iterations, x0, kernel_ptr, model_ptr, opt_ptr, true);
svgd.Initialize();
If activated, it will run with the maximum number of threads the machine has. Note: you're unlikely to see performance improvements if you only have a few particles (<= 10).