Skip to content

Latest commit

 

History

History
398 lines (321 loc) · 17.8 KB

instructions.md

File metadata and controls

398 lines (321 loc) · 17.8 KB

Detailed Instructions

Here, detailed usage instructions are provided, but for specific APIs, please consult the documentation (built with -D BUILD_DOCUMENTATION=TRUE).

How do you use it in the CMake project?

In your CMakeLists.txt:

find_package(SVGDCpp REQUIRED)

add_executable(app app.cpp)

target_link_libraries(app
    PRIVATE
    SVGDCpp::SVGDCpp
)

Then in the source code:

#include <SVGDCpp/Core>
#include <SVGDCpp/Model>
#include <SVGDCpp/Kernel>
#include <SVGDCpp/Optimizer>

How to run SVGD to estimate a model?

Mainly, there are 5 elements required to run a minimally working SVGD algorithm:

  1. Particles' coordinates (in a $m \times n$ matrix for $n$ particles in a $m$-dimensional problem)
  2. Model object;
  3. Kernel object;
  4. Optimizer object;
  5. SVGD object.

The first four are required by the fifth as shared pointer arguments. The Model and Kernel classes are used to compute $\hat{\phi}^*$ in Equation (8) of the SVGD paper, and the step size $\epsilon$ is provided with the Optimizer class.

Basic usage

This example uses the provided classes MultivariateNormal, GaussianRBF, and Adam optimizer to estimate a multivariate normal distribution.

// Set up a 2-D multivariate normal problem with an RBF kernel using 10 particles and the Adam optimizer for 1000 iterations
size_t dim = 2, num_particles = 10, num_iterations = 1000;
auto x0 = std::make_shared<Eigen::MatrixXd>(3*Eigen::MatrixXd::Random(dim, num_particles));

// Create a multivariate normal model pointer
Eigen::Vector2d mean(5, -5);
Eigen::Matrix2d covariance;
covariance << 0.5, 0, 0, 0.5;
std::shared_ptr<Model> model_ptr = std::make_shared<MultivariateNormal>(mean, covariance);

// Create RBF kernel pointer
std::shared_ptr<Kernel> kernel_ptr = std::make_shared<GaussianRBFKernel>(x0, GaussianRBFKernel::ScaleMethod::Median, model_ptr);

// Create Adam optimizer pointer
std::shared_ptr<Optimizer> opt_ptr = std::make_shared<Adam>(dim, num_particles, 1.0e-1, 0.9, 0.999);

Then, instantiate the SVGD class by passing the created objects. You can either instantiate by passing the created objects or by passing the SVGDOptions struct.

// Instantiate the SVGD class using regular arguments
SVGD svgd(dim, num_iterations, x0, kernel_ptr, model_ptr, opt_ptr);

/* OR */

// Instantiate the SVGD class using regular arguments, with lower and upper bounds
SVGD svgd(dim, num_iterations, x0, kernel_ptr, model_ptr, opt_ptr, Eigen::Vector2d(-2, -7), Eigen::Vector2d(5, 3));

/* OR */

// Instantiate the SVGD class using the SVGDOptions struct
SVGDOptions options;

options.Dimension = dim;
options.NumIterations = num_iterations;
options.CoordinateMatrixPtr = x0;
options.ModelPtr = model_ptr;
options.KernelPtr = kernel_ptr;
options.OptimizerPtr = opt_ptr;
options.LowerBound = Eigen::Vector2d(-2, -7);
options.UpperBound = Eigen::Vector2d(5, 3)
options.Parallel = false;
options.IntermediateMatricesOutputPath = "mylog.txt";
options.LogIntermediateMatrices = false;

SVGD svgd(options);

Finally, initialize and run.

// x0 has initial particle coordinates at this point

// Initialize and run SVGD
svgd.Initialize();
svgd.Run();

// x0 is now updated at this point

At the moment, this library provides the following classes:

  1. Models:
    • MultivariateNormal
  2. Kernels:
    • GaussianRBF
  3. Optimizers:
    • AdaGrad
    • RMSProp
    • Adam

More may be provided in the future, but not guaranteed.

How to create a custom kernel or model function?

Most likely, you'll want to estimate your own density functions and/or use other types of kernel functions. There are in general 3 methods to provide them using this library, of which the first 2 are preferable over the third.

Before you proceed, here are some important things to note about the model and kernel classes that you create.

  • Model: The model MUST be a non-negative scalar function within the bounds of interest (negative values will cause the program to fail when EvaluateLogModel is used). The model function is essentially a probability density function to be estimated.

  • Kernel: By definition, a kernel function must be specified such that the matrix generated by the said function must be positive definite (see arXiV: Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm and Wikipedia: RKHS). The kernel function $k(x, x^{'})$ takes in 2 inputs to compute a 'similarity' score between the 2 inputs. In this library, $x$ is the function variable (that undergoes differentiation) while $x^{'}$ is referred to as the location.

In a nutshell, you create function objects as desired that are used to update either the base (method 1) or derived (method 2) classes of Model and/or Kernel. The function objects have a fixed signature that you must follow:

  • Model function: std::function<double(const VectorXADd &, const std::vector<MatrixXADd> &)> with inputs of (1) function variable, (2) function parameters.
  • Kernel function: std::function<double(const VectorXADd &, const std::vector<MatrixXADd> &, const VectorXADd &)> with inputs of (1) function variable, (2) function parameters, (3) kernel location.

⚠️ A common error that you may encounter when your model and/or kernel functions are invalid (e.g., log of a negative number, division by zero, etc.) is the following:

cppad-20240602 error from a known source:
yq = f.Forward(q, xq): a zero order Taylor coefficient is nan.
Corresponding independent variables vector was written to binary a file.

ℹ️ VectorXADd and MatrixXADd are aliases for Eigen::VectorXd<CppAD::AD<double>> and Eigen::MatrixXd<CppAD::AD<double>> respectively. The CppAD::AD<double> type is a double type that is used by the CppAD library to perform automatic differentiation. To convert between the double and CppAD::AD<double> types you can use ConvertToCppAD and ConvertFromCppAD.

Method 1: use the Kernel or Model base class directly

This is the simplest method and is useful if you don't need specific variables/functionalities in your Model and/or Kernel classes.

In this method, you instantiate a Model/Kernel class and update the Model/Kernel functions by calling UpdateModel/UpdateKernel with your desired function object as the argument. The example below also shows how you can compose new models from existing ones.

// Create the 1st model function as a exp(-(x-mean)^T * cov^{-1} * (x-mean)) where params = {mean, cov} = {nx1 vector, nxn matrix}
std::function<double(const VectorXADd &, const std::vector<MatrixXADd> &)> model_fun_1 = 
[](const VectorXADd &x, const std::vector<MatrixXADd> &params)
{
    VectorXADd result(1);
    VectorXADd diff = x - params[0];
    result << (- (diff.transpose() * param[1].inverse() * diff)).array().exp();
    return result;
};

// Define the kernel function as a exp(x.sum()) without parameters
std::function<double(const VectorXADd &, const std::vector<MatrixXADd> &, const VectorXADd &)> kernel_fun = 
[](const VectorXADd &x, const std::vector<MatrixXADd> &params, const VectorXADd &location)
{
    VectorXADd result(1);
    result << CppAD::exp(x.sum());
    return result;
};

// Use the defined function object to create the 1st model
Model model1 = Model(2);
model1.UpdateModel(model_fun);

// Pass a lambda function directly to create the 2nd model
Model model2 = Model(2);
model2.UpdateModel([](const VectorXADd &x)
{
    VectorXADd result(1);
    result << CppAD::cos(x.sum()) + 2.0;
    return result;
});

// Create a composed model whose model function is the sum of the two other model functions
Model sum_model = model1 + model2;

// Create a composed model whose model function is the difference of the two other model functions
Model difference_model = model1 - model2;

// Create a composed model whose model function is the product of the two other model functions
Model product_model = model1 * model2;

// Create a composed model whose model function is the quotient of the two other model functions
Model quotient_model = model1 / model2;

Method 2: create a derived class from Kernel or Model

This is useful if you want to write your own derived class to provide specific functionality.

The example below is extracted from the MultivariateNormal class. The model function should be provided in the constructor and "incorporated" into the class with the UpdateModel (or UpdateKernel if you're writing a new kernel) method.

If you wish to also run SVGD in parallel mode, ensure that the Clone* functions are provided for polymorphic copying which is required for multithreaded execution of SVGD. Otherwise, they are not necessary.

class MultivariateNormal : public Model
{
public:
    MultivariateNormal() {}

    MultivariateNormal(const Eigen::VectorXd &mean, const Eigen::MatrixXd &covariance) : Model(mean.rows())
    {
        // Ensure that the dimensions of mean matches covariance
        if (!CompareVectorSizes<Eigen::VectorXd, Eigen::VectorXd>(mean, covariance.col(0)) ||
            !CompareVectorSizes<Eigen::VectorXd, Eigen::VectorXd>(mean, covariance.row(0)))
        {
            throw DimensionMismatchException("Dimensions of parameter vectors/matrices do not match.");
        }

        // Store parameters
        model_parameters_.push_back(ConvertToCppAD(mean));
        model_parameters_.push_back(ConvertToCppAD(covariance));

        // Compute the normalization constant based on the updated parameters
        ComputeNormalizationConstant();

        // Define model function (the kernel density only, without normalization constant)
        auto model_fun = [](const VectorXADd &x, const std::vector<MatrixXADd> &params)
        {
            VectorXADd result(1), diff = x - params[0];
            result << (-0.5 * (diff.transpose() * params[1].inverse() * diff).array()).exp();
            return result;
        };

        UpdateModel(model_fun);
    }

    // Provide polymorphic copying functionality (required only for parallel mode)
    virtual std::unique_ptr<Model> CloneUniquePointer() const
    {
        return std::make_unique<MultivariateNormal>(*this);
    }

    // Provide polymorphic copying functionality (required only for parallel mode)
    virtual std::shared_ptr<Model> CloneSharedPointer() const
    {
        return std::make_shared<MultivariateNormal>(*this);
    }

    /* and so on */
};

Method 3: override EvaluateModel* and/or EvaluateLogModel* directly (advanced)

This may be useful as it skips the need to do automatic differentiation, which may improve performance. The reason you would choose this method is if you can provide closed-form functions for the gradients (Jacobians) for both the model function and its logarithmic transformation. However, you won't be able to do functional composition (e.g., obj1 + obj2, obj1*obj2, etc.) because composition works with the Cpp::ADFun object in the base class; instead you directly implement the closed-form functions without touching the Cpp::ADFun objects.

In this method, you override the Evaluate* functions (provided by the base classes); these are the functions used by the SVGD class to "move" the particles. You can use parameters in your model/kernel function through the model_parameters_/kernel_parameters_ member variable, though you'll need to convert it with ConvertFromCppAD as its elements are of MatrixXADd type.

class MyModel : public Model
{
public:
    // Provide polymorphic copying functionality (required only for parallel mode)
    virtual std::unique_ptr<Model> CloneUniquePointer() const
    {
        return std::make_unique<MyModel>(*this);
    }

    // Provide polymorphic copying functionality (required only for parallel mode)
    virtual std::shared_ptr<Model> CloneSharedPointer() const
    {
        return std::make_shared<MyModel>(*this);
    }

    // Override the method to do nothing (by default it re-records the automatic differentiation tape used by CppAD)
    void Initialize() override {}

    // Override the method to compute the model function directly
    double EvaluateModel(const Eigen::VectorXd &x) override
    {
        std::vector<Eigen::MatrixXd> converted_params(model_parameters_.size());

        for (size_t i = 0; i < model_parameters_.size(); ++i)
        {
            // Convert from CppAD::AD<double> type to double type
            converted_params[i] = ConvertFromCppAD(model_parameters_[i]);
        }

        return converted_params[0] * std::cos(x(0)) + converted_params[1] * std::cos(x(1)) + converted_params[2] * x.prod() + converted_params[3];
    }

    // Override the method to compute the log model function directly
    double EvaluateLogModel(const Eigen::VectorXd &x) override
    {
        return std::log(EvaluateModel(x));
    }

    // Override the method to compute the model Jacobian directly
    Eigen::VectorXd EvaluateModelGrad(const Eigen::VectorXd &x) override
    {
        std::vector<Eigen::MatrixXd> converted_params(model_parameters_.size());

        for (size_t i = 0; i < model_parameters_.size(); ++i)
        {
            // Convert from CppAD::AD<double> type to double type
            converted_params[i] = ConvertFromCppAD(model_parameters_[i]);
        }

        Eigen::Vector2d result;
        result << -converted_params[0] * std::sin(x(0)) + converted_params[2] * x(1), -converted_params[1] * std::sin(x(1)) + converted_params[2] * x(0);
        return result;
    }

    // Override the method to compute the log model Jacobian directly
    Eigen::VectorXd EvaluateLogModelGrad(const Eigen::VectorXd &x) override
    {
        return EvaluateModelGrad(x) / EvaluateModel(x);
    }

    /* and so on */
};

In some cases, your derived class may need to provide a Hessian implementation. For example, if you use the provided GaussianRBF kernel and would like to compute the scale using the ScaleMethod::Hessian method, then the Hessian of your log model is required.

Common unary math function alternatives for CppAD::AD<double> types

When defining functions for your Model or Kernel classes, you need to work with vectors and matrices of CppAD::AD<double> type. The CppAD library provides a list of common math functions for you to use in place of the regular functions. A short list is provided below; see the CppAD website for the entire list.

Instead of Use
std::exp CppAD::exp
std::abs CppAD::abs
std::pow CppAD::pow
std::sqrt CppAD::sqrt

Using the Kernel and Model classes for manual computation (advanced)

Whenever a Model or a Kernel object is defined for the first time, they're uninitialized, so you'll need to initialize them if you want to use them directly. This applies even if you're copying from an existing object.

// Create RBF kernel pointer
std::shared_ptr<Kernel> kernel_ptr = std::make_shared<GaussianRBFKernel>(x0, GaussianRBFKernel::ScaleMethod::Median, model_ptr);

// Compute kernel value with arbitrary values and at the default location (i.e., reference coordinate)
kernel_ptr->Initialize(); // initialize the kernel so that the `Evaluate*` functions are available
Eigen::Vector2d input(16.328, -4.059);
double output = kernel_ptr->EvaluateKernel(input);

// Compute kernel value with arbitrary values and at an arbitrary location
Eigen::Vector2d new_location(5.2225, 97.6);
kernel_ptr->UpdateLocation(new_location);
kernel_ptr->Initialize();
output = kernel_ptr->EvaluateKernel(input);

Anytime an Update* method is called manually, the instance's Initialize() function must be called before the scope ends. This is so that the CppAD tape can record any modifications to the functions.

// Okay
{
    Model m;
    m.UpdateModel(some_fun);
    m.Initialize();
}

// Also okay (order of Update* doesn't matter here)
{
    Model m;
    m.UpdateParameters(some_params);
    m.UpdateModel(some_fun);
    m.Initialize();
}

// Okay, but redundant
{
    Model m;
    m.UpdateParameters(some_params);
    m.Initialize();
    m.UpdateModel(some_fun);
    m.Initialize();
}

// Applies also to Kernel classes (order doesn't matter here, as long as `Initialize()` is called at the end)
{
    Kernel k;
    k.UpdateKernel(some_fun);
    k.UpdateLocation(some_vec);
    k.UpdateParameters(some_params);
    k.Initialize();

    k.UpdateParameters(some_new_params); // doesn't get registered; subsequent computation uses the `some_params` values
}

// If you defined models/kernels to be composed into a new model (and don't intend to use the original objects), then `Initialize()` is only needed for the new object
{
    Model m_a(2), m_b(2);
    m_a.UpdateModel(model_function_a);
    m_b.UpdateModel(model_function_b);
    Model m_new(2);
    m_new = m_a + m_b;
    m_new.Initialize();

    // same for copied objects
    Model m_another(m_new);
    m_another.Initialize(); // required
}

Typically, unless you call the Update* methods directly, you don't need to worry about calling Initialize() yourself. In most cases, you interact with the SVGD object, and its UpdateModelParameters and UpdateKernelParameters functions take care of re-initializing the updates internally.

Multithreaded execution

Running SVGD in parallel mode is as simple as calling SetupForParallelMode() at the beginning of your code and then instantiating the SVGD class with the parallel argument as true.

// Set up parallel mode execution
SetupForParallelMode();

/* Definitions of dim, num_iterations, x0m, model_ptr, kernel_ptr, opt_ptr here */

// Instantiate the SVGD class with the `parallel` argument as true
SVGD svgd(dim, num_iterations, x0, kernel_ptr, model_ptr, opt_ptr, true);
svgd.Initialize();

If activated, it will run with the maximum number of threads the machine has. Note: you're unlikely to see performance improvements if you only have a few particles (<= 10).