-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.nim
741 lines (607 loc) · 25.3 KB
/
index.nim
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
import nimib
nbInit
nbDoc.useLatex
nbText: md"""
# Bayesian Inference with Linear Model
At the time of this writing, Nim does not have any libraries for
conducting Bayesian inference. However, it is quite easy for us to
write all of the necessary code ourselves. This is a good exercise for learning
about Bayesian inference and the syntax and speed of Nim make it an excellent
choice for the this. In this tutorial we will walk through
the different parts needed to perform Bayesian inference with a linear model.
We will assume that you have some basic understanding of Bayesian
inference already. There are many excellent introductions available in books
and online.
Bayes theorem states:
$$ P(\theta|\text{Data}) = \frac{P(\text{Data}|\theta)P(\theta)}{P(\text{Data})}$$
Each of these terms are referred to as:
$$ \text{Posterior} = \frac{\text{Likelihood} \cdot \text{Prior}}{\text{Marginal Likelihood}}$$
A marginal probability $P(\text{Data}) $ can be discrete probability distribution
where $P(\text{Data}) = \sum_{\theta}P(\text{Data}|\theta)P(\theta)$ so the posterior
probability distribution is:
$$ P(\theta|\text{Data}) = \frac{P(\text{Data}|\theta)P(\theta)}{\sum_{\theta}P(\text{Data}|\theta)P(\theta)}$$
Or a marginal probability can be a continuous probability distribution
where $P(\text{Data}) = \int d\theta P(\text{Data}|\theta)P(\theta)$ so the posterior
probability distribution is:
$$ P(\theta|\text{Data}) = \frac{P(\text{Data}|\theta)P(\theta)}{\int d\theta P(\text{Data}|\theta)P(\theta)}$$
In this tutorial we will condsider a simple linear model
$ y_{i} = \beta_{0} + \beta_{1} x_{i} + \epsilon_{i} $ where $\epsilon \sim N(0, \tau)$.
Under this model, the parameters $\beta_0$ (y intercept) and $\beta_1$ (slope),
describe the relationship between a
predictor variable $x$ and a response variable $y$, with some unaccounted for
residual random error ($\epsilon$) which is normally distributed with a mean of $0$ and
standard deviation $\tau$.
We will estimate the values of the slope ($\beta_{0}$), the y-intercept
($\beta_{1}$), and the standard deviation of the residual random error ($\tau$)
from observed $x$ and $y$ data.
Expressing this with Bayes rule gives us:
$$ \displaystyle P(\beta_{0}, \beta_{1}, \tau | Data) =
\frac{P(Data|\beta_{0}, \beta_{1}, \tau) P(\beta_{0}, \beta_{1}, \tau)}
{\iiint d\beta_{0} d\beta_{1} d\tau P(Data|\beta_{0}, \beta_{1}, \tau) P(\beta_{0}, \beta_{1}, \tau)} $$
# Generate Data
We need some data to work with. Let's simulate data
under the model wth $\beta_{0}=0$, $\beta_{1}=1$, and $\tau=1$:
$ y = 0 + 1x + \epsilon$ where $\epsilon \sim N(0, 1) $
"""
nbCode:
import std/sequtils, std/random, std/stats
var
n = 100
b0 = 0.0
b1 = 1.0
sd = 1.0
x = newSeq[float](n)
y = newSeq[float](n)
for i in 0 ..< n:
x[i] = rand(0.0..100.0)
y[i] = b0 + (b1 * x[i]) + gauss(0.0, sd)
nbText: md"""
We can use `ggplotnim` to see what these data look like.
"""
nbCode:
import datamancer, ggplotnim
var sim = toDf(x, y)
ggplot(sim, aes("x", "y")) +
geom_point() +
ggsave("images/simulated-data.png")
nbImage("images/simulated-data.png")
nbText: md"""
# Priors
We need to choose prior probability distributions for each of the model parameters
that we are estimating. Let's use a normal distribution for the priors on
$\beta_{0}$ and $\beta_{1}$. The $\tau$ parameter must be a positive value
greater than 0 so let's use the gamma distribution as the prior on $\tau$.
$$ \beta_{0} \sim Normal(\mu_{0}, \sigma^{2})$$
$$ \beta_{1} \sim Normal(\mu_{1}, \sigma^{2})$$
$$ \tau \sim Gamma(\kappa_{0}, \theta_{0})$$
To calculate the prior probabilities of proposed model parameter values, we will need
the proability density functions for these distributions.
#### Normal PDF
$$ p(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^{2}} $$
#### Gamma PDF
$$ p(x) = \frac{1}{\Gamma(k)\theta^{k}} x^{k-1} e^{-\frac{x}{\theta}} $$
"""
nbCode:
import std/math
proc normalPdf(x, m, s: float): float =
result = E.pow((-0.5 * ((x - m) / s)^2)) / (s * sqrt(2.0 * PI))
proc gammaPdf(x, k, t: float): float =
result = x.pow(k - 1.0) * E.pow(-(x / t)) / (gamma(k) * t.pow(k))
nbText: md"""
We now need to decide how to parameterize these prior probability densities.
Since this is for the purpose of demonstration, let's use very informed
priors so that we can quickly get a good sample from the posterior.
$$ \beta_{0} \sim Normal(0, 1)$$
$$ \beta_{1} \sim Normal(1, 1)$$
$$ \tau \sim Gamma(1, 1)$$
In a real analysis, the priors should encompass all values that we consider possible and
it may also be a good idea to examine how sensitive the analysis is to the choice
of priors.
"""
nbCode:
type
Normal = object
mu: float
sigma: float
Gamma = object
k: float
sigma: float
Priors = object
b0: Normal
b1: Normal
sd: Gamma
var priors = Priors(
b0: Normal(mu:0.0, sigma:10.0),
b1: Normal(mu:1.0, sigma:10.0),
sd: Gamma(k:1.0, sigma:10.0))
nbText: md"""
Now that we have prior probability density functions for each model parameter
that we are estimating, we need to be able to compute the joint prior probability
for all of the parameters of the model.
We will actually be using the $ln$ of the probabilities to
reduce rounding error since these values can be quite small.
"""
nbCode:
proc logPrior(priors: Priors, b0, b1, sd: float ): float =
let
b0Prior = ln(normalPdf(b0, priors.b0.mu, priors.b0.sigma))
b1Prior = ln(normalPdf(b1, priors.b1.mu, priors.b1.sigma))
sdPrior = ln(gammaPdf(sd, priors.sd.k, priors.sd.sigma))
result = b0Prior + b1Prior + sdPrior
nbText: md"""
# Likelihood
We need to be able to calculate the likelihood of the observed $y_{i}$ values
given the observed $x_{i}$ values and the model parameter values, $\beta_{0}$,
$\beta_{1}$, and $\tau$.
We can write the model in a slightly different way:
$$\mu = \beta_{0} +\beta_{1} x$$
$$ y \sim N(\mu, \tau) $$
Then to compute the likelihood for a given set of $\beta_{0}$, $\beta_{1}$, $\tau$
parameters and data values $x_{i}$ and $y_{i}$ we use the normal probability
density function which we wrote before to compute our prior probabilities.
We will work with the $ln$ of the likelihood as we did with the priors.
"""
nbCode:
proc logLikelihood(x, y: seq[float], b0, b1, sd: float): float =
var likelihoods = newSeq[float](y.len)
for i in 0 ..< y.len:
let pred = b0 + (b1 * x[i])
likelihoods[i] = ln(normalPdf(y[i], pred, sd))
result = sum(likelihoods)
nbText: md"""
# Posterior
We cannot analytically solve the posterior probability distribution of our
linear model as the integration of the marginal likelihood is intractable.
But we can approximate it with markov chain monte carlo (MCMC) thanks to this
property of Bayes rule:
$$ \displaystyle P(\beta_{0}, \beta_{1}, \tau | Data) \propto
P(Data|\beta_{0}, \beta_{1}, \tau) P(\beta_{0}, \beta_{1}, \tau) $$
The marginal likelihood is a normalizing constant. Without it, we no longer
have a probability density function. But the relative probability of a given set
of parameter values to another set is the same. We only care about which parameter
values are most probable so this is enough for us. We can determine which values
have higher probability by randomly walking through parameter space while accepting
or rejecting new values based on how probable they are.
"""
nbCode:
proc logPosterior(x, y: seq[float], priors: Priors, b0, b1, sd: float): float =
let
like = logLikelihood(x, y, b0, b1, sd)
prior = logPrior(priors, b0, b1, sd)
result = like + prior
nbText: md"""
# MCMC
We will use a Metropolis-Hastings algorithm to approximate the unnormalized posterior.
The steps of this algorithm are as follows:
1. Initialize arbitrary starting values for each model parameter.
2. Compute the unnormalized posterior probability density for the initialized
model parameter values given the observed data.
3. Propose a new value for one of the model parameters by randomly drawing from
a symetrical distribition centered on the present value. We will use a normal
distribution.
4. Compute the unnormalized posterior probabity density for the proposed model
parameters values given the observed data.
5. Compute the ratio of the proposed probability density to the previous
probability density. $ \alpha = \frac{P(\beta_{0}\prime, \beta_{1}\prime, \tau\prime | Data)}{P(\beta_{0}, \beta_{1}, \tau | Data)} $
This is called the acceptance ratio.
6. All proposals with greater probability than the current state are accepted.
So a proposeal is accepted if the acceptance ratio is greater than 1. If the
acceptance ratio is less than 1 then it is accepted with probability \alpha. In practice we can make a
random draw from a uniform distribution ranging from 0 to 1 and accept proposals
anytime the acceptance ratio is less than the random uniform variate, $ \alpha < r \sim Uniform(0, 1)$.
7. If a proposal is accepted then we will update the parameter in our model with
a new state. Otherwise the state will remain the same as before.
8. We then repeat steps 3-7 until reaching a desired number of iterations that
we believe give us an adequate sample from the unnormalized posterior distribution.
Note: When proposing values for $\tau$ from a normal distribution. It is possible for
proposals to be less than one. However, the gamma prior probability distribution
ranges from 0 to infinity meaning proposals of less than one are not valid and must
be rejected. Fortunately, this approach still gives us a proper sample of the targeted unnormalized posterior
probability distribution.
"""
nbCode:
type
MCMC = object
x: seq[float]
y: seq[float]
nSamples: int
priors: Priors
propSd: float
Samples = object
n: int
b0: seq[float]
b1: seq[float]
sd: seq[float]
proc run(m: MCMC, b0Start, b1Start, sdStart: float): Samples =
var
b0Samples = newSeq[float](m.nSamples)
b1Samples = newSeq[float](m.nSamples)
sdSamples = newSeq[float](m.nSamples)
logProbs = newSeq[float](m.nSamples)
b0Samples[0] = b0Start
b1Samples[0] = b1Start
sdSamples[0] = sdStart
logProbs[0] = logPosterior(m.x, m.y, m.priors, b0Start, b1Start, sdStart)
for i in 1..<m.nSamples:
let
b0Proposed = gauss(b0Samples[i-1], m.propSd)
b1Proposed = gauss(b1Samples[i-1], m.propSd)
sdProposed = gauss(sdSamples[i-1], m.propSd)
if sdProposed > 0.0:
var
logProbProposed = logPosterior(m.x, m.y, m.priors, b0Proposed, b1Proposed, sdProposed)
ratio = exp(logProbProposed - logProbs[i-1])
if rand(1.0) < ratio:
b0Samples[i] = b0proposed
b1Samples[i] = b1proposed
sdSamples[i] = sdproposed
logProbs[i] = logProbProposed
else:
b0Samples[i] = b0Samples[i-1]
b1Samples[i] = b1Samples[i-1]
sdSamples[i] = sdSamples[i-1]
logProbs[i] = logProbs[i-1]
else:
b0Samples[i] = b0Samples[i-1]
b1Samples[i] = b1Samples[i-1]
sdSamples[i] = sdSamples[i-1]
logProbs[i] = logProbs[i-1]
result = Samples(n:m.nSamples, b0:b0Samples, b1:b1Samples, sd:sdSamples)
nbText: md"""
Let's use this code to generate 100,000 samples. We'll cheat a bit and use the parameters
that the data were simulated under as starting values to speed things up. A standard deviation of 0.1
seems to work pretty well for the proposal distribution of each parameter.
"""
nbCode:
var
mcmc = MCMC(x:x, y:y, nSamples:100000, priors:priors, propSd:0.1)
samples = mcmc.run(0.0, 1.0, 1.0)
nbText: md"""
In some analyses, the posterior probability distribution could be multimodal
which might make the MCMC chain sensitive to the starting value as it may get stuck
in a local optimum and not sample from the full posterior distribtion. It is therefore
a good idea to acquire multiple samples with different starting values to see if they
converge on the same parameter estimates. Here we will run our MCMC one more time
with some different starting values
"""
nbCode:
var samples2 = mcmc.run(0.2, 1.01, 1.1)
nbText: md"""
# Trace plots
We can get a sense for how well our mcmc performed and therefore gain some
sense for how good our estimates might be by looking at the trace plots. Trace plots
show the parameter values stored during each step in the mcmc chain. Either
the accepted proposal value or the previous value if the proposal was rejected. Trace
plots can be an unreliable indicator of mcmc performance so it is a good
idea to assess it with other strategies as well.
"""
nbCode:
import std/strformat
proc plotTraces(samples: seq[Samples], prefix: string) =
var
sample = newSeq[seq[int]](samples.len)
chain = newSeq[seq[int]](samples.len)
for i in 0 ..< samples.len:
sample[i] = toSeq(1 .. samples[i].n)
chain[i] = repeat(i+1, samples[i].n )
var
df = toDf({
"sample": concat(sample),
"chain": concat(chain),
"b0": concat(map(samples, proc(x: Samples): seq[float] = x.b0)),
"b1": concat(map(samples, proc(x: Samples): seq[float] = x.b1)),
"sd": concat(map(samples, proc(x: Samples): seq[float] = x.sd))
})
ggplot(df, aes(x="sample", y="b0")) +
geom_line(aes(color="chain")) +
ggsave(fmt"{prefix}b0.png")
ggplot(df, aes(x="sample", y="b1")) +
geom_line(aes(color="chain")) +
ggsave(fmt"{prefix}b1.png")
ggplot(df, aes(x="sample", y="sd")) +
geom_line(aes(color="chain")) +
ggsave(fmt"{prefix}sd.png")
plotTraces(@[samples, samples2], "images/trace-")
nbImage("images/trace-b0.png")
nbImage("images/trace-b1.png")
nbImage("images/trace-sd.png")
nbText: md"""
# Burnin
Initially the mcmc chain may spend time exploring unlikely regions of
parameter space. We can get a better approximation of the posterior if we
exclude these early steps in the chain. These excluded samples are referred to
as the burnin. A burnin of 10% seems to be more than enough with our informative priors
and starting values.
"""
nbCode:
proc burn(samples: Samples, p: float): Samples =
var
burnIx = (samples.n.float * p).ceil.int
n = samples.n - burnIx
b0 = samples.b0[burnIx..^1]
b1 = samples.b1[burnIx..^1]
sd = samples.sd[burnIx..^1]
result = Samples(n:n, b0:b0, b1:b1, sd:sd)
var
burnin1 = burn(samples, 0.1)
burnin2 = burn(samples2, 0.1)
nbText: md"""
# Histograms
Now that we have our post burnin samples, let's see what our posterior probability
distributions look like for each model paramter.
"""
nbCode:
proc plotHistograms(samples: seq[Samples], prefix: string) =
var chain = newSeq[seq[int]](samples.len)
for i in 0 ..< samples.len:
chain[i] = repeat(i+1, samples[i].n )
var
df = toDf({
"chain": concat(chain),
"b0": concat(map(samples, proc(x: Samples): seq[float] = x.b0)),
"b1": concat(map(samples, proc(x: Samples): seq[float] = x.b1)),
"sd": concat(map(samples, proc(x: Samples): seq[float] = x.sd))
})
ggplot(df, aes(x="b0", fill="chain")) +
geom_histogram(position="identity", alpha=some(0.5)) +
ggsave(fmt"{prefix}b0.png")
ggplot(df, aes(x="b1", fill="chain")) +
geom_histogram(position="identity", alpha=some(0.5)) +
ggsave(fmt"{prefix}b1.png")
ggplot(df, aes(x="sd", fill="chain")) +
geom_histogram(position="identity", alpha=some(0.5)) +
ggsave(fmt"{prefix}sd.png")
plotHistograms(@[burnin1, burnin2], "images/hist-")
nbImage("images/hist-b0.png")
nbImage("images/hist-b1.png")
nbImage("images/hist-sd.png")
nbText: md"""
# Posterior Summarization
It's always great to visualize your results. But it's also useful to calculate some
summary statistics for the posterior. There are several different ways you could
do this.
## Combine Chains
Before we summarize the posterior. Let's combine the samples into one.
Since we have determined that our MCMC is performing well, we can be confident
that each post burnin sample is a sample of the same distribution.
"""
nbCode:
proc concat(samples: seq[Samples]): Samples =
var
n = concat(map(samples, proc(x: Samples): int = x.n))
b0 = concat(map(samples, proc(x: Samples): seq[float] = x.b0))
b1 = concat(map(samples, proc(x: Samples): seq[float] = x.b1))
sd = concat(map(samples, proc(x: Samples): seq[float] = x.sd))
result = Samples(n:sum(n), b0:b0, b1:b1, sd:sd)
var burnin = concat(@[burnin1, burnin2])
nbText: md"""
## Posterior means
One way to summarize the estimates of the posterior distributions is to simply calculate
the mean. Let's see how close these values are to the true values of the parameters.
"""
nbCode:
import stats
var
meanB0 = mean(burnin.b0).round(3)
meanB1 = mean(burnin.b1).round(3)
meanSd = mean(burnin.sd).round(3)
echo "Mean b0: ", meanB0
echo "Mean b1: ", meanB1
echo "Mean sd: ", meanSd
nbText: md"""
## Credible Intervals
The means give us a point estimate for our parameter values but they tell us
nothing about the uncertainty of our estimates. We can get a sense for that by
looking at credible intervals. There are two widely used approaches for this,
equal tailed intervals, and highest density intervals. These will often match
each other closely when the target distribution is unimodal and symetric.
We will calculate the 89% interval for each of these below. Why 89%? Why not?
Credible interval threshold values are completely arbitrary.
### Equal Tailed Interval
In this interval, the probability of being below the interval is equal to the
probability of being above the interval.
"""
nbCode:
import algorithm
proc quantile(samples: seq[float], interval: float): float =
let
s = sorted(samples, system.cmp[float])
k = float(s.len - 1) * interval
f = floor(k)
c = ceil(k)
if f == c:
result = s[int(k)]
else:
let
d0 = s[int(f)] * (c - k)
d1 = s[int(c)] * (k - f)
result = d0 + d1
proc eti(samples: seq[float], interval: float): (float, float) =
let
p = (1 - interval) / 2
let
q0 = quantile(samples, p)
q1 = quantile(samples, 1 - p)
result = (q0, q1)
var
(b0EtiMin, b0EtiMax) = eti(burnin.b0, 0.89)
(b1EtiMin, b1EtiMax) = eti(burnin.b1, 0.89)
(sdEtiMin, sdEtiMax) = eti(burnin.sd, 0.89)
echo "Eti b0: ", b0EtiMin.round(3), " - ", b0EtiMax.round(3)
echo "Eti b1: ", b1EtiMin.round(3), " - ", b1EtiMax.round(3)
echo "Eti sd: ", sdEtiMin.round(3), " - ", sdEtiMax.round(3)
nbText: md"""
### Highest Posterior Density Interval
In this interval, all values inside of the interval have a higher probability
density than values outside of the interval.
"""
nbCode:
proc hdi(samples: seq[float], credMass: float): (float, float) =
let
sortedSamples = sorted(samples, system.cmp[float])
ciIdxInc = int(floor(credMass * float(sortedSamples.len)))
nCIs = sortedSamples.len - ciIdxInc
var ciWidth = newSeq[float](nCIs)
for i in 0..<nCIs:
ciWidth[i] = sortedSamples[i + ciIdxInc] - sortedSamples[i]
let
minCiWidthIx = minIndex(ciWidth)
hdiMin = sortedSamples[minCiWidthIx]
hdiMax = sortedSamples[minCiWidthIx + ciIdxInc]
result = (hdiMin, hdiMax)
var
(b0HdiMin, b0HdiMax) = hdi(burnin.b0, 0.89)
(b1HdiMin, b1HdiMax) = hdi(burnin.b1, 0.89)
(sdHdiMin, sdHdiMax) = hdi(burnin.sd, 0.89)
echo "Hdi b0: ", b0HdiMin.round(3), " - ", b0HdiMax.round(3)
echo "Hdi b1: ", b1HdiMin.round(3), " - ", b1HdiMax.round(3)
echo "Hdi sd: ", sdHdiMin.round(3), " - ", sdHdiMax.round(3)
nbText: md"""
# Standardize Data
The $\beta_{0}$ (intercept) and $\beta_{1}$ (slope) parameters of a linear model
present a bit of a challenge for MCMC because believable values for them are
tightly correlated. This means that a lot proposed values will be rejected and the
chain will not move efficiently. An easy way to get around this problem is
to standardize our data by rescaling them relative to their mean and standard deviation.
$$ \zeta_{x_{i}} = \frac{x_{i} - \bar{x}}{SD_{x}} $$
$$ \zeta_{y_{i}} = \frac{y_{i} - \bar{y}}{SD_{y}} $$
"""
nbCode:
var
xSt = newSeq[float](n)
ySt = newSeq[float](n)
xMean = mean(x)
xSd = standardDeviation(x)
yMean = mean(y)
ySd = standardDeviation(y)
for i in 0 ..< n:
xSt[i] = (x[i] - xMean) / xSd
ySt[i] = (y[i] - yMean) / ySd
nbText: md"""
We can see that both the $y$ and $x$ values are now centered around zero and have the same scale.
"""
nbCode:
var standardized = toDf(xSt, ySt)
ggplot(standardized, aes(x="xSt", y="ySt")) +
geom_point() +
ggsave("images/st-simulated-data.png")
nbImage("images/st-simulated-data.png")
nbText: md"""
We can now run the MCMC just as before with some minor changes. The standard deviation
$\tau$ for the standardized data is going to be much smaller. Let's make the prior
more informative. The standard deviation of the proposal distribution used before
would result in most proposals being rejected so we should use a smaller value
this time. Finally we should use some different starting values than before.
"""
nbCode:
var
st_priors = Priors(
b0: Normal(mu:0.0, sigma:1.0),
b1: Normal(mu:1.0, sigma:1.0),
sd: Gamma(k:0.0035, sigma:1.0))
st_mcmc = MCMC(x:xSt, y:ySt, nSamples:100000, priors:st_priors, propSd:0.001)
st_samples1 = st_mcmc.run(0.0, 1.0, 0.0034)
st_samples2 = st_mcmc.run(0.1, 1.01, 0.0036)
nbText: md"""
### Convert back to original scale
To interpret these estimates we need to convert back to the original scale.
$$ \beta_{0} = \zeta_{0} SD_{y} + \bar{y} - \zeta_{1} SD_{y} \bar{x} / SD_{x} $$
$$ \beta_{1} = \zeta_{1} SD_{y} / SD_{x} $$
$$ \tau = \zeta_{\tau} * SD_{y} $$
"""
nbCode:
proc backTransform(samples: Samples, xMean, xSd, yMean, ySd: float): Samples =
var
b0 = newSeq[float](samples.n)
b1 = newSeq[float](samples.n)
sd = newSeq[float](samples.n)
for i in 0 ..< samples.n:
b0[i] = samples.b0[i] * ySd + yMean - samples.b1[i] * ySd * xMean / xSd
b1[i] = samples.b1[i] * ySd / xSd
sd[i] = samples.sd[i] * ySd
result = Samples(n:samples.n, b0:b0, b1:b1, sd:sd)
var
st_samples_trans1 = backTransform(st_samples1, xMean, xSd, yMean, ySd)
st_samples_trans2 = backTransform(st_samples2, xMean, xSd, yMean, ySd)
nbText: md"""
# Traceplots
Let's have a look at the trace plots.
"""
nbCode:
plotTraces(@[st_samples_trans1, st_samples_trans2], "images/trace-st-")
nbImage("images/trace-st-b0.png")
nbImage("images/trace-st-b1.png")
nbImage("images/trace-st-sd.png")
nbText: md"""
It looks like more of the proposals were accepted and we have a much better
sample from the posterior.
"""
nbText: md"""
# Burnin
Let's get a post burnin sample as we did before.
"""
nbCode:
var
st_burnin1 = burn(st_samples_trans1, 0.1)
st_burnin2 = burn(st_samples_trans2, 0.1)
nbText: md"""
# Histograms
Now we can have a look at the posterior distribution of our estimates.
"""
nbCode:
plotHistograms(@[st_burnin1, st_burnin2], "images/hist-st-")
nbImage("images/hist-st-b0.png")
nbImage("images/hist-st-b1.png")
nbImage("images/hist-st-sd.png")
nbText: md"""
These look pretty good but it's hard to know how they compare to the previous
MCMC estimates. Let's summarize the posterior to get a better idea.
"""
nbCode:
var st_burnin = concat(@[st_burnin1, st_burnin2])
var
st_meanB0 = mean(st_burnin.b0).round(3)
st_meanB1 = mean(st_burnin.b1).round(3)
st_meanSd = mean(st_burnin.sd).round(3)
echo "Mean b0: ", meanB0
echo "Mean b1: ", meanB1
echo "Mean sd: ", meanSd
var
(st_b0EtiMin, st_b0EtiMax) = eti(st_burnin.b0, 0.89)
(st_b1EtiMin, st_b1EtiMax) = eti(st_burnin.b1, 0.89)
(st_sdEtiMin, st_sdEtiMax) = eti(st_burnin.sd, 0.89)
echo "Eti b0: ", st_b0EtiMin.round(3), " - ", st_b0EtiMax.round(3)
echo "Eti b1: ", st_b1EtiMin.round(3), " - ", st_b1EtiMax.round(3)
echo "Eti sd: ", st_sdEtiMin.round(3), " - ", st_sdEtiMax.round(3)
var
(st_b0HdiMin, st_b0HdiMax) = hdi(st_burnin.b0, 0.89)
(st_b1HdiMin, st_b1HdiMax) = hdi(st_burnin.b1, 0.89)
(st_sdHdiMin, st_sdHdiMax) = hdi(st_burnin.sd, 0.89)
echo "Hdi b0: ", st_b0HdiMin.round(3), " - ", st_b0HdiMax.round(3)
echo "Hdi b1: ", st_b1HdiMin.round(3), " - ", st_b1HdiMax.round(3)
echo "Hdi sd: ", st_sdHdiMin.round(3), " - ", st_sdHdiMax.round(3)
nbText: md"""
In comparison to the previous mcmc results shown below, it looks like the standardized
analysis was actually pretty similar. But we can have greater confidence after
using the standardized analysis because we got a better posterior sample.
"""
nbCode:
echo "Mean b0: ", meanB0
echo "Mean b1: ", meanB1
echo "Mean sd: ", meanSd
echo "Eti b0: ", b0EtiMin.round(3), " - ", b0EtiMax.round(3)
echo "Eti b1: ", b1EtiMin.round(3), " - ", b1EtiMax.round(3)
echo "Eti sd: ", sdEtiMin.round(3), " - ", sdEtiMax.round(3)
echo "Hdi b0: ", b0HdiMin.round(3), " - ", b0HdiMax.round(3)
echo "Hdi b1: ", b1HdiMin.round(3), " - ", b1HdiMax.round(3)
echo "Hdi sd: ", sdHdiMin.round(3), " - ", sdHdiMax.round(3)
nbText: md"""
# Final Note
We could have simply and more efficiently done all of this using least squares regression.
However the Bayesian approach allows us to very easily and intuitively express
uncertainty about our estimates and can be easily extended to much more complex
models for which there are not such simple solutions. We could also incorporate prior
knowledge or assumptions in a way not possible with frequentist approaches.
The exercise above provides a starting point for this and shows just how easy it is to do with Nim!
"""
nbSave