Metropolis generator for Raku language
Metropolis–Hastings algorithm is Markov chain Monte Carlo (MCMC) method for generating sequence of random samples from probability distribution function.
It can be used to generate naturally looking data for applications testing and presentation. For example let's assume that you have software for online marketing campaigns. And you have to generate fake traffic (100 000 site visits) that occurred after campaign started 10 days ago to record tutorial for users. It's obvious that such traffic is not constant. After campaign start traffic is low, then it quickly gains momentum, reaches interest peak and slowly fades away like this:
f( x ) = 2x / ( x^3 + 0.5 ), { x > 0 }
generated using awesome Desmos calculator.
With this module you will be able to generate such random samples with any probability distribution function within given domain.
Note: This is raw and fast implementation of Metropolis–Hastings algorithm. You may need to turn few dials to get best results for your cases. If you get bad results or timeouts be sure to read TWEAKS section.
use Metropolis;
my $m = Metropolis.new(
function => -> $x { 2* $x / ( $x ** 3 + 0.5 ) },
domain => 0 .. 10
);
say $m.sample( ) for ^42;
Params:
function
(mandatory) - probability distribution function, must accept exactly one positional argument.domain
(mandatory) - range within which samples should be generated.jumper
(optional) - function that will return new sample from given jumping distribution. Check TWEAKS section for more info.
Params:
timeout
(optional) - how long can algorithm look for next sample, no timeout by default.
Get next sample value of Num
type. Nil
will be returned in case of timeout.
Params:
samples
(mandatory) - how many samples should be gathered.scale
(optional) - how highy
axis should be,100
characters by default.
Crude plotting to quickly visualize requested distribution. With swapped axis (x is vertical, turn your monitor). All values are rounded to nearest integer. All bars are rounded to nearest 1/8. For example to display distribution from SYNOPSIS:
my $m = Metropolis.new(
function => -> $x { 2 * $x / ( $x ** 3 + 0.5 ) },
domain => 0 .. 10
);
$m.graph( samples => 10000, scale => 10 );
0 ███▋ (1622)
1 ██████████ (4545)
2 ████ (1800)
3 █▊ (803)
4 ▉ (417)
5 ▌ (251)
6 ▌ (208)
7 ▎ (135)
8 ▎ (106)
9 ▎ (91)
10 ▏ (22)
Potato-grade but useful :)
Common issues:
- Some places have accumulation of samples way too high for requested probability distribution there.
- Algorithm is not reaching every part of the domain. Some zero probability gaps (like in
f( x ) = sin x
function) are not crossed by algorithm at all, places beyond them do not generate samples. - Sampling is realy, really slow.
Solutions:
- Don't go crazy with domain width. By default algorithm is tuned to work with domain of total width
<= 10
. Remember that you getNum
values and you can project them to bigger domain (like for example 864_000 timestamps from the last 10 days) if needed. - Avoid functions with zero probability gaps within domain. This algorithm can handle them to some extend. But you will get best results if probability is always positive across whole domain. Sometimes it is better to generate samples for each positive probability area separately.
- Burn samples to forget initial state. This algorithm is based on Markov chain Monte Carlo method which needs warmup to eventually converge to the desired distribution. For example throw away first 1000 samples.
Expert:
You can provide your own jump function with desired distribution. By default normal distribution with standard-deviation = 2
is used. Uniform distribution is also provided with default delta = 3
.
To use prebuilt and/or change its default:
my $m = Metropolis.new(
...,
# let's jump bigger zero probability gaps!
jumper => &uniform-distribution.assuming( delta => 15 )
);
To provide own:
my $m = Metropolis.new(
...,
# discrete decrease of longer jump distance probability
jumper => sub ( :$mean ) { # mean is current sample position
# roll jump distance within 0..1 range
my $jump = rand;
# make small jumps less probable by randomly re-rolling
$jump max= rand if $jump < 0.5;
# return new sampling point
# remember output must have equal chance to go both ways!
return $mean + (1, -1).pick * $jump;
}
);