-
Notifications
You must be signed in to change notification settings - Fork 5
/
Homework_7_EP_AH (2).Rmd
146 lines (86 loc) · 4.96 KB
/
Homework_7_EP_AH (2).Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
title: "Homework_7"
author: "Andrea Hemmelmann - Eric Parra"
date: "9 de enero de 2019"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Homework 7 - Learning the mean diameter of a coastal boulder
## Problem
Prepare a R Markdown-based HTML file (groups of 2) containing documented R code
on the problem of learning the mean diameter of a coastal boulder from Gaussian
distributed measurements with a fixed variance. Document and comment on your
prior, likelihood, and posterior.
A group of geoscience students use a tape to measure the long axis of a coral
boulder stranded on a sandy beach. The students have collected n data points of
the axis diameter x. We know that the tape measure is only accurate to a certain
point, and assume, for the sake of simplicity, that all measurements come from
the same Gaussian distribution. We express the spread of measurements with the
distribution’s variance sigma𝜎.
**Given n measurements, what is the boulder’s most believable actual size?**
## Results
### Data
As the Gaussian distribution is defined by 2 parameters (mean and standar deviation)
we have to define these parameters. First, we define the true mean𝜇 [m], which is the actual diameter of the coral boulder’s long axis:
```{r}
mu <- 5.2 #true mean [m]
```
Next, we define the known constant variance 𝜎 sigma2[m2], which expresses the accuracy of the tape measure.
```{r}
sig <- 0.1
```
As we saw in classes, we then create "n" random data points from a normal distribution with
rnorm() (we know that all the data comes from the same Gaussian distribution). This command expects the standard deviation instead of the variance as an argument, so we need to take the square root of "sig":
```{r}
n <- 10 # Number of observations
dat <- rnorm(n, mu, sqrt(sig)) # Where "n" equal to number of observations; "mu" equal the true mean (5.2); and "sig" equal to variance (0.1)
print(dat)
```
We obtained 10 values between 4.543923. - 5.823496 which represent possible values of the boulder given the true mean and sigma. Next, we plot a probability density estimate of the data we created above by using the function "density" which computes kernel density estimates in order to have a look of the data.
```{r}
plot(density(dat), col = "blue", lwd = 2, main = "Random Gaussian data")
```
### Prior
In order to define our prior belief, we specify our prior on "mu", based on what we believe to be realistic for the maximum boulder diameter. In this case we assume, based in our previous experience or knowledgment, that the mean value might drop between 4 and 6.5 m.
```{r}
prior_mu <- seq(4, 6.5, 0.1)
```
The above creates 26 evenly spaced values between 4 and 6.5 m as potential candidates for (the mean ("mu"). We now assign weights to each of these values, which have to reflect our initial belief about each candidate value. However, we do not have aditional information we choose random weights using the "runif" command which generates random deviates.
```{r}
weights_mu<- runif(length(prior_mu)) # Set weights for prior values
print(weights_mu)
```
As was told in classes, because the prior is a proper probability distribution, we need to
make sure that all these weights add up to unity. The best way to guarantee
this is to re-normalise so that all probability masses add up to unity.
```{r}
p_mu <- weights_mu / sum(weights_mu)
plot(prior_mu, p_mu, type= "h", col="red")
sum(p_mu)
```
The above is our prior belief on the probability for each of 26 probable values of the mean.
## Likelihood
In order to calculate the likelihood, we have to create a probability density function (for the normal distribution) for each value of the objetct "dat" which are 10 random data points from a normal distribution by using the variable "dat" and the standar deviation and the mean. This have to be done for each value of "prior_mu".
We have used a sapply() function instead of a for loop. "The application of this function avoids the coding of cumbersome loops, reducing the chance of error". ((https://www.datacamp.com/)"
(sapply() function does the same jobs as lapply() function but returns a vector).
```{r}
likeli <- sapply(prior_mu,
function(x){
prod(dnorm(dat, x, sqrt(sig)))}
)
print(likeli)
plot(prior_mu, likeli, type = "h", col = "blue", xlab = "Boulder diameter (m)", ylab = "Likelihood")
```
## Posterior
The general way to calculate the posterior is to multiply the likelihood with the prior, as follow:
```{r}
posterior= (likeli * prior_mu) / sum(likeli * prior_mu)
plot(prior_mu, posterior, type = "h", col = "blue", xlab = "Boulder diameter (m)", ylab = "Posterior")
```
As we see in the plot, the most probable mean is 5.1 (the true mean)
```{r}
b_m <- prior_mu[which (posterior== max(posterior))]
print(b_m)
```