A primer on CUDA & c++ when one is familiar with Python's scientific ecosystem (pandas, numpy & scipy)
Tip
Open the outline there on the top right corner of the github interface to make sense of the ToC
At some point last year with all this hype of LLMs and Deep Learning overall I somehow realized that getting acquainted with this whole CUDA thing, which is at the core of the heavy lifting when it comes to train deep learning models, may well be a good idea.
In addition, I had no previous experience with c++
which made this project appealing to me as a first contact point with compiled languages. As I'm quite familiar with python's scientific ecosystem (namely, pandas, numpy, scipy...) I thought that kicking off from there would be efficient to have a guide on what to implement.
If you are starting with c++
and familiar with python's scientific ecosystem, this repository may well be a good support to your learning and/or give you some idea about where to start. Just bear in mind that this project is a sort of a crash course on the subject, so be extra careful with the content you find here. More than ever, perform your own parallel searches to discover possible pitfalls or errors I may have overlooked.
The repository contains an algorithm written in CUDA/c++, that is aimed to predict, using bayesian statistics, the parameters of a normal distribution, namely
Specifically, let's say that you go out there and gather some data about some process. Even more, before inspecting it, you have for whatever reason, a reasonable guess that the process you are investigating is normally distributed. You might want to know the parameters of the distribution because that will give you an insight about how your system might behave in the long run and, crucially, the level of credence that you should put in those parameters (although, we won't compute this latter).
Of course, this is a highly contrived example because the goal is not to learn bayesian statistics but rather to learn CUDA. If you want to learn more about the former, the origin of the algorithm is in the Think Bayes Book by Allen Downey which is a great resource to get started in the topic when one is familiar with Python.
In the quick-prototyping.ipynb
notebook you will find more detailed explanations about the in and outs of the algorithm, but as a quick reference this is what the c++
code will implement:
mu, sigma = 20, 2 # the values we want to infer
np.random.seed(42)
observations = np.random.normal(mu, sigma, 50).astype(np.float32)
mu_range = np.linspace(18, 22, grid_size, dtype=np.float32)
sigma_range = np.linspace(1, 3, grid_size, dtype=np.float32)
# Generate the outer product of the ranges and the observations
mu_grid, sigma_grid, observations_grid = np.meshgrid(
mu_range, sigma_range, observations
)
# build the prior
prior = np.ones((grid_size, grid_size))
# compute the likelihood
densities = norm(mu_grid, sigma_grid).pdf(observations_grid)
likes = densities.prod(axis=2)
# compute the posterior
posterior = likes * prior
posterior /= posterior.sum()
# Print the expectations
print(
np.sum(mu_range * posterior.sum(axis=1)),
np.sum(sigma_range * posterior.sum(axis=0))
)
Basically you can think of the posterior as a kind of hill like this:
Where the
These observations are a sort of joint probability, this is, you need to know the probability of finding all those values under the assumption of some
The resulting joint probability will then be:
where
Note
Probability densities are not probabilities
We are loosely talking all the time about joint probability but, recall that probability densities are not probabilities, to find out the probability one needs to integrate over a range. However, bayesian statistics deals perfectly well with these densities as the only thing it cares about are proportions and, after all, we will be normalizing the outcomes at the end so all the volume under the hill will be
Now, there's a point where the joint probability get's maximized for a pair of parameters, i.e., the peak of the hill which in our case will match the expected value as this is a very contrived example based on normal distributions. To find it out, we need to compute the marginals, that is, isolate each of the parameters and compute the expectation. These marginals look like the green and blue lines in the walls of the plot, i.e., the margins:
In terms of c++
you will need:
- The
gcc
library or your OS equivalent to compile c++ code. - The google test suite for the tests
You will need the CUDA Toolkit provided by NVIDIA, refer to their documentation to install it. The Ubuntu instructions worked overall fine for me, however the post actions were not completely clear as in the POWER9 Setup systemctl
complained that the service was masked:
Loaded: masked (Reason: Unit nvidia-persistenced.service is masked.)
You can unmask it with
sudo systemctl unmask nvidia-persistenced.service
I have been using the CUDA Toolkit 12.2 on a GeForce 1650 with compute capability 7.5
. In principle Nvidia sells that every card is fully backwards compatible but it appears to me that if you are below compute capability 5.0
, managed memory may well be a pain. Check the unified memory section on the programming guide for an explanation on the limitations.
To be able to run the notebook, you will need some python libraries that are defined in the requirements.txt
file. Just create the virtual environment of your preference and install them:
pip install --upgrade pip && pip install -r requirements.txt
The actual c++
code does not accept parameters, for the time being, but you can run it for the default ones by following these steps:
- Go to the
c++
implementation directory and make the binaries:
cd cpp/grid-algorithm
make all
- This above should have created an executable, run it:
./bin/grid
If you have run above step, you should already have a binary as well for the tests and you can run them by:
./bin/tests
If you are trying to debug the code, it's always a good idea to run the executables with compute-sanitizer
which is part of the CUDA toolkit and it will complain if you are doing something fishy:
compute-sanitizer ./bin/grid
Here's a brief explanation of the directories tree:
cpp
: Main C++ implementationinc
: header files for the main functionscuda_functions.h
: a header file containing the specific sizes of the grids sent to a cuda kernel. Put it another way, an intermediate step between a wrapper function and the cuda kernel.kernels.h
: the main fuctions executed on the device (i.e., the GPU).utils.h
: some utility functions.wrappers.h
: it contains functions that wrap actions so themain
routine will have a few logic blocks matching up the descriptions in the jupyter notebook.
src
: the source code of the implementationgrid.cu
: the main routine of the implementationtests.cs
: the test suite
notebooks
: it contains the the notebook with the python equivalent along with some inline annotations and comments.
I'd guess that not a lot of people will reach this part alive and if they do they will be unlikely to check these references. If you are in that negligible tier, though, here you go.
In my learning journey these folks worked really well for me:
- CUDA programming guide: this is a fundamental, you don't need to cover it all, but at least the two first chapters. Some of the examples are in the cuda-samples repo which is a bit hard to make sense if you are not familiar with
c++
- An Even Easier Introduction to CUDA provided by NVIDIA is great once you have some familiarity with the programming guide as it will allow you to craft some actual code.
- Numba tutorial that implements CUDA objects via Python in case you are familiar with the language