Homework_7_EP_AH (2).Rmd

---
title: "Homework_7"
author: "Andrea Hemmelmann - Eric Parra"
date: "9 de enero de 2019"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Homework 7  - Learning the mean diameter of a coastal boulder

## Problem

Prepare a R Markdown-based HTML file (groups of 2) containing documented R code
on the problem of learning the mean diameter of a coastal boulder from Gaussian 
distributed measurements with a fixed  variance. Document and comment on your 
prior, likelihood, and posterior.

A group of geoscience students use a tape to measure the long axis of a coral 
boulder stranded on a sandy beach. The students have collected n data points of 
the axis diameter x. We know that the tape measure is only accurate to a certain
point, and assume, for the sake of simplicity, that all measurements come from 
the same Gaussian distribution. We express the spread of measurements with the 
distribution’s variance sigma𝜎.

**Given n measurements, what is the boulder’s most believable actual size?**

## Results

### Data

As the Gaussian distribution is defined by 2 parameters (mean and standar deviation)
we have to define these parameters. First, we define the true mean𝜇 [m], which is the actual diameter of the coral boulder’s long axis:

```{r}

mu <-  5.2 #true mean [m]

```

Next, we define the known constant variance 𝜎 sigma2[m2], which expresses the accuracy of the tape measure.

```{r}

sig <-  0.1  

```

As we saw in classes, we then create "n" random data points from a normal distribution with
rnorm() (we know that all the data comes from the same Gaussian distribution). This command expects the standard deviation instead of the variance as an argument, so we need to take the square root of "sig":

```{r}

n <- 10 # Number of observations

dat <- rnorm(n, mu, sqrt(sig)) # Where "n" equal to number of observations; "mu" equal the true mean (5.2); and "sig" equal to variance (0.1)

print(dat)
```

We obtained 10 values between 4.543923. - 5.823496 which represent possible values of the boulder given the true mean and sigma. Next, we plot a probability density estimate of the data we created above by using the function "density" which computes kernel density estimates in  order to have a look of the data.

```{r}

plot(density(dat), col = "blue", lwd = 2, main = "Random Gaussian data")

```

### Prior 

In order to define our prior belief, we specify our prior on "mu", based on what we believe to be realistic for the maximum boulder diameter. In this case we assume, based in our previous experience or knowledgment, that the mean value might drop between 4 and 6.5 m.

```{r}

prior_mu <- seq(4, 6.5, 0.1) 

```

The above creates 26 evenly spaced values between 4  and 6.5 m as potential candidates for (the mean ("mu"). We now assign weights to each of these values, which have to reflect our initial belief about each candidate value. However, we do not have aditional information we choose random weights using the "runif" command which generates random deviates.

```{r}

weights_mu<- runif(length(prior_mu)) # Set weights for prior values
print(weights_mu)

```

As was told in classes, because the prior is a proper probability distribution, we need to
make sure that all these weights add up to unity. The best way to guarantee 
this is to re-normalise so that all probability masses add up to unity.

```{r}

p_mu <- weights_mu / sum(weights_mu)

plot(prior_mu, p_mu, type= "h", col="red")
   
sum(p_mu) 
  
```

The above is our prior belief on the probability for each of 26 probable values of the mean.

## Likelihood

In order to calculate the likelihood, we have to create a probability density function  (for the normal distribution) for each value of the objetct "dat" which are 10 random data points from a normal distribution by using the variable "dat" and the standar deviation and the mean. This have to be done for each value of "prior_mu". 

We have used a sapply() function instead of a for loop. "The application of this function avoids the coding of cumbersome loops, reducing the chance of error". ((https://www.datacamp.com/)" 

(sapply() function does the same jobs as lapply() function but returns a vector). 


```{r}

likeli <- sapply(prior_mu, 
                 function(x){
                   prod(dnorm(dat, x, sqrt(sig)))}
                 )
print(likeli)

plot(prior_mu, likeli, type = "h", col = "blue", xlab = "Boulder diameter (m)", ylab = "Likelihood")
```

## Posterior

The general way to calculate the posterior is to multiply the likelihood with the prior, as follow:

```{r}

posterior= (likeli * prior_mu) / sum(likeli * prior_mu)

plot(prior_mu, posterior, type = "h", col = "blue", xlab = "Boulder diameter (m)", ylab = "Posterior")

```

As we see in the plot, the most probable mean is 5.1 (the true mean)

```{r}
b_m <- prior_mu[which (posterior== max(posterior))]
print(b_m)

```