CUDA Grid Estimation

TL;DR

A primer on CUDA & c++ when one is familiar with Python's scientific ecosystem (pandas, numpy & scipy)

Tip

Open the outline there on the top right corner of the github interface to make sense of the ToC ↗️

Introduction

Motivation

At some point last year with all this hype of LLMs and Deep Learning overall I somehow realized that getting acquainted with this whole CUDA thing, which is at the core of the heavy lifting when it comes to train deep learning models, may well be a good idea.

In addition, I had no previous experience with c++ which made this project appealing to me as a first contact point with compiled languages. As I'm quite familiar with python's scientific ecosystem (namely, pandas, numpy, scipy...) I thought that kicking off from there would be efficient to have a guide on what to implement.

Intended Public

If you are starting with c++ and familiar with python's scientific ecosystem, this repository may well be a good support to your learning and/or give you some idea about where to start. Just bear in mind that this project is a sort of a crash course on the subject, so be extra careful with the content you find here. More than ever, perform your own parallel searches to discover possible pitfalls or errors I may have overlooked.

Brief Description

The repository contains an algorithm written in CUDA/c++, that is aimed to predict, using bayesian statistics, the parameters of a normal distribution, namely $\mu$ and $\sigma$, out of some set of observations.

Specifically, let's say that you go out there and gather some data about some process. Even more, before inspecting it, you have for whatever reason, a reasonable guess that the process you are investigating is normally distributed. You might want to know the parameters of the distribution because that will give you an insight about how your system might behave in the long run and, crucially, the level of credence that you should put in those parameters (although, we won't compute this latter).

Of course, this is a highly contrived example because the goal is not to learn bayesian statistics but rather to learn CUDA. If you want to learn more about the former, the origin of the algorithm is in the Think Bayes Book by Allen Downey which is a great resource to get started in the topic when one is familiar with Python.

Quick Python Code

In the quick-prototyping.ipynb notebook you will find more detailed explanations about the in and outs of the algorithm, but as a quick reference this is what the c++ code will implement:

mu, sigma = 20, 2  # the values we want to infer
np.random.seed(42)
observations = np.random.normal(mu, sigma, 50).astype(np.float32)

mu_range = np.linspace(18, 22, grid_size, dtype=np.float32)
sigma_range = np.linspace(1, 3, grid_size, dtype=np.float32)

# Generate the outer product of the ranges and the observations
mu_grid, sigma_grid, observations_grid = np.meshgrid(
    mu_range, sigma_range, observations
)

# build the prior
prior = np.ones((grid_size, grid_size))

# compute the likelihood
densities = norm(mu_grid, sigma_grid).pdf(observations_grid)
likes = densities.prod(axis=2)

# compute the posterior
posterior = likes * prior
posterior /= posterior.sum()

# Print the expectations
print(
    np.sum(mu_range * posterior.sum(axis=1)),
    np.sum(sigma_range * posterior.sum(axis=0))
)

Visualization

Basically you can think of the posterior as a kind of hill like this:

Where the $(x, y)$ plane are the ranges of possible values of $\mu$ and $\sigma$ we want to explore. The crux of the problem, though, is how the height of the hill is calculated for each pair $(\mu, \sigma)$ out of the observations.

These observations are a sort of joint probability, this is, you need to know the probability of finding all those values under the assumption of some $(\mu_i, \sigma_i)$. If we were evaluating the observations under $(20, 2)$ this is what we would find:

The resulting joint probability will then be:

$$x_1;\cdot x_2;\cdot x_3$$

where $x_1, x_2, x_3$ are the observations

Note

Probability densities are not probabilities
We are loosely talking all the time about joint probability but, recall that probability densities are not probabilities, to find out the probability one needs to integrate over a range. However, bayesian statistics deals perfectly well with these densities as the only thing it cares about are proportions and, after all, we will be normalizing the outcomes at the end so all the volume under the hill will be $1$

Now, there's a point where the joint probability get's maximized for a pair of parameters, i.e., the peak of the hill which in our case will match the expected value as this is a very contrived example based on normal distributions. To find it out, we need to compute the marginals, that is, isolate each of the parameters and compute the expectation. These marginals look like the green and blue lines in the walls of the plot, i.e., the margins:

Run It!

Installation

C++

In terms of c++ you will need:

The gcc library or your OS equivalent to compile c++ code.
The google test suite for the tests

CUDA

You will need the CUDA Toolkit provided by NVIDIA, refer to their documentation to install it. The Ubuntu instructions worked overall fine for me, however the post actions were not completely clear as in the POWER9 Setup systemctl complained that the service was masked:

Loaded: masked (Reason: Unit nvidia-persistenced.service is masked.)

You can unmask it with

 sudo systemctl unmask nvidia-persistenced.service

Source

I have been using the CUDA Toolkit 12.2 on a GeForce 1650 with compute capability 7.5. In principle Nvidia sells that every card is fully backwards compatible but it appears to me that if you are below compute capability 5.0, managed memory may well be a pain. Check the unified memory section on the programming guide for an explanation on the limitations.

Python Requirements

To be able to run the notebook, you will need some python libraries that are defined in the requirements.txt file. Just create the virtual environment of your preference and install them:

pip install --upgrade pip && pip install -r requirements.txt

Execution

The actual c++ code does not accept parameters, for the time being, but you can run it for the default ones by following these steps:

Go to the c++ implementation directory and make the binaries:

cd cpp/grid-algorithm
make all

This above should have created an executable, run it:

./bin/grid

Tests

If you have run above step, you should already have a binary as well for the tests and you can run them by:

./bin/tests

Debug

If you are trying to debug the code, it's always a good idea to run the executables with compute-sanitizer which is part of the CUDA toolkit and it will complain if you are doing something fishy:

compute-sanitizer ./bin/grid

File Structure

Here's a brief explanation of the directories tree:

cpp: Main C++ implementation
- inc: header files for the main functions
  - cuda_functions.h: a header file containing the specific sizes of the grids sent to a cuda kernel. Put it another way, an intermediate step between a wrapper function and the cuda kernel.
  - kernels.h: the main fuctions executed on the device (i.e., the GPU).
  - utils.h: some utility functions.
  - wrappers.h: it contains functions that wrap actions so the main routine will have a few logic blocks matching up the descriptions in the jupyter notebook.
- src: the source code of the implementation
  - grid.cu: the main routine of the implementation
  - tests.cs: the test suite
notebooks: it contains the the notebook with the python equivalent along with some inline annotations and comments.

Resources

I'd guess that not a lot of people will reach this part alive and if they do they will be unlikely to check these references. If you are in that negligible tier, though, here you go.

In my learning journey these folks worked really well for me:

CUDA programming guide: this is a fundamental, you don't need to cover it all, but at least the two first chapters. Some of the examples are in the cuda-samples repo which is a bit hard to make sense if you are not familiar with c++
An Even Easier Introduction to CUDA provided by NVIDIA is great once you have some familiarity with the programming guide as it will allow you to craft some actual code.
Numba tutorial that implements CUDA objects via Python in case you are familiar with the language

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CUDA Grid Estimation

TL;DR

Introduction

Motivation

Intended Public

Brief Description

Quick Python Code

Visualization

Run It!

Installation

C++

CUDA

Python Requirements

Execution

Tests

Debug

File Structure

Resources

Files

README.md

Latest commit

History

README.md

File metadata and controls

CUDA Grid Estimation

TL;DR

Introduction

Motivation

Intended Public

Brief Description

Quick Python Code

Visualization

Run It!

Installation

C++

CUDA

Python Requirements

Execution

Tests

Debug

File Structure

Resources