RandomCL is an OpenCL library for generating random numbers in parallel. It contains implementations of more than 20 random number generators. While it is intended for usage on graphics processing units (GPUs), the generators can run on any OpenCL-enabled device. The article An OpenCL library for parallel random number generators contains further details about implementation and testing of the library.
RandomCL is released under the BSD-3 license.
RNGs do not need to be built - all code is contained in headers. Examples and buffer generation can be built using a makefile by running make examples/examplePi/pi
, make examples/examplePrint/print
, make examples/examplePrintBuffer/printBuffe
or make examples/examplePrintLocal/printLocal
and then running the created executable in the examples folder.
- OpenCL library (examples and buffer generation require cpp OpenCL header).
There are two kinds of random number generators in RandomCL. Most generators are intended to have one instance of generator per thread. Their state is supposed to be saved in a private variable for each thread and threads can generate random numbers independantly. All of these have a similar interface.
The xorshift1024 generator, however, has the state shared between 32 threads. Since state is saved in local memory it has a slightly different interface.
Each generator is contained in its own header named <NAME>.cl
. It defines the following type and functions, where <NAME>
is the name of the generator:
Internal state of the generator. Different definition (and size) for each generator.
Seeds the RNG using given seed. Seed should be different for each instance of the generator (thread).
Advances state of the generator and generates a 32-bit unsigned integer. State is not a pointer, since it is implemented as a macro for some generators.
Advances state of the generator and generates a 64-bit unsigned integer (unsigned long). State is not a pointer, since it is implemented as a macro for some generators.
Advances state of the generator and generates a 32-bit floating point number between 0 and 1. State is not a pointer, since it is implemented as a macro for some generators.
Advances state of the generator and generates a 64-bit floating point number (double) between 0 and 1. State is not a pointer, since it is implemented as a macro for some generators.
Advances state of the generator and generates a 64-bit floating point number (double) between 0 and 1. For generators, that internaly generate 32-bit numbers this can be faster then
<NAME>_double(<NAME>_state state)
, as it is generated from 32-bit random number, instead of 64. State is not a pointer, since it is implemented as a macro for some generators.
Middle Square Weyl Sequence generator. Returns 64-bit numbers. Not recomended for serious use, as paralell implementation does not pass BigCrush test.
Tyche generator. Modified to return 64 bit numbers.
S. Neves, F. Araujo, Fast and small nonlinear pseudorandom number generators for computer simulation, in: International Conference on Parallel Processing and Applied Mathematics, Springer, 2011, pp. 92–101.
Tyche-i generator. Uses inverse state transition function of Tyche and is usually faster. Modified to return 64 bit numbers. On a discrete GPU this generator is expected to be the fastest among generators that pass the BigCrush test.
S. Neves, F. Araujo, Fast and small nonlinear pseudorandom number generators for computer simulation, in: International Conference on Parallel Processing and Applied Mathematics, Springer, 2011, pp. 92–101.
A 32-bit implementation of Tiny Mersenne Twister. Returns 32-bit numbers.
Tiny mersenne twister, http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html.
A 64-bit implementation of Tiny Mersenne Twister. Returns 64-bit numbers.
Tiny mersenne twister, http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html.
A 64-bit implementation of Permutated Congruential generator (PCG-XSH-RR). Returns 32-bit numbers.
M. E. O’Neill, Pcg: A family of simple fast space-efficient statistically good algorithms for random number generation, ACM Transactions on Mathematical Software.
Ran2 compound generator. Returns 64-bit numbers.
W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery, Numerical recipes in c: The art of scientific computing (; cambridge (1992).
KISS (Keep It Simple, Stupid) generator, proposed in 2009. Returns 64-bit numbers.
G. Marsaglia, 64-bit kiss rngs, https://www.thecodingforums.com/threads/64-bit-kiss-rngs.673657.
KISS (Keep It Simple, Stupid) generator, proposed in 1999. Returns 32-bit numbers.
G. Marsaglia, Random numbers for c: End, at last?, http://www.cse.yorku.ca/~oz/marsaglia-rng.html.
64-bit Linear Congruential Generator. Returns 32-bit numbers. Not recomended for serious use as it does not pass BigCrush test.
P. L’ecuyer, Tables of linear congruential generators of different sizes and good lattice structure, Mathematics of Computation of the American Mathematical Society 68 (225) (1999) 249–260.
128-bit Linear Congruential Generator. Returns 64-bit numbers.
P. L’ecuyer, Tables of linear congruential generators of different sizes and good lattice structure, Mathematics of Computation of the American Mathematical Society 68 (225) (1999) 249–260.
Multiplicative Lagged Fibbonaci generator. Returns 64-bit random numbers, but the lowest bit is always 1.
G. Marsaglia, L.-H. Tsay, Matrices and the structure of random number sequences, Linear algebra and its applications 67 (1985) 147–156.
Multiplicative Recursive Generator. Returns 32-bit random numbers.
P. L’Ecuyer, R. Touzin, Fast combined multiple recursive generators with multipliers of the form a=±2 q±2 r, in: Proceedings of the 32nd conference on Winter simulation, Society for Computer Simulation International, 2000, pp. 683–689.
Multiplicative Recursive Generator. Returns 64-bit random numbers, but the lowest bit is always 0.
P. L’ecuyer, Good parameters and implementations for combined multiple recursive random number generators, Operations Research 47 (1) (1999) 159–164.
Mersenne Twister. Returns 32-bit random numbers.
M. Matsumoto, T. Nishimura, Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator, ACM Transactions on Modeling and Computer Simulation (TOMACS) 8 (1) (1998) 3–30.
ISAAC (Indirection, Shift, Accumulate, Add, and Count). Returns 32-bit random numbers. Does not work on graphics cards, as it requires unaligned accesses to memory.
R. J. Jenkins, Isaac, in: International Workshop on Fast Software Encryption, Springer, 1996, pp. 41–49.
A 512-bit WELL (Well-Equidistributed Long-period Linear) implementation. Returns 32-bit random numbers.
F. Panneton, P. L’ecuyer, M. Matsumoto, Improved long-period generators based on linear recurrences modulo 2, ACM Transactions on Mathematical Software (TOMS) 32 (1) (2006) 1–16.
Xorshift64/32* - 64-bit Xorshift generator with multiplication of output. Returns 32-bit numbers
S. Vigna, An experimental exploration of marsaglia’s xorshift generators, scrambled, ACM Transactions on Mathematical Software (TOMS) 42 (4) (2016) 30.
Philox2x32-10. Returns 64-bit numbers.
J. K. Salmon, M. A. Moraes, R. O. Dror, D. E. Shaw, Parallel random numbers: as easy as 1, 2, 3, in: High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, IEEE, 2011, pp. 1–12.
1024-bit xorshift generator. State is shared between 32 threads. As it uses barriers, all threads of a work group must call the generator at the same time, even if they do not require the result. In localRNGs.h
header is the function RNGLocal::xorshift1024_local_mem
that calculates required state size given local size. See "examplePrintLocal".
M. Manssen, M. Weigel, A. K. Hartmann, Random number generators for massively parallel simulations on GPU, The European Physical Journal-Special Topics 210 (1) (2012) 53–71.
cl::Buffer generateRandomBuffer(unsigned int num, std::string generatorName, cl::CommandQueue queue, size_t global, size_t local, std::string type = "float", unsigned long long seed = 0)
Generates a buffer of random numbers (on a fast device using this and reading random numbers from global memory can be significantly slower than generating numbers on-the-fly in kerner where they are used).
num
how many numbers to generate
generatorName
which RNG to use. Valid is any name of RNG implemented in RandomCL
queue
OpenCL CommandQueue to use
global
number of threads to use for generation
local
number of threads in a work group
type
type of numbers to generate. Valid options are: "float", "double", "uint", "ulong"
seed
set seed for repeatable generation.