Skip to content

Commit

Permalink
Added the least squares equiation and the m equation relating the cor…
Browse files Browse the repository at this point in the history
…relation r
  • Loading branch information
fabarrios committed Oct 18, 2024
1 parent 0530924 commit b2827ff
Show file tree
Hide file tree
Showing 2 changed files with 220 additions and 536 deletions.
21 changes: 17 additions & 4 deletions LinearModel/LinearModel.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ description: "to prepare Class2020 presentations"
---


```{r setup, cache=FALSE}
```{r setup, cache=FALSE, echo=FALSE}
library(knitr)
library(rmdformats)
Expand Down Expand Up @@ -39,10 +39,22 @@ library(emmeans)

The term "regression" was introduced by Francis Galton (Darwin's nephew) during the XIX century to describe a biological phenomenon. The heights of the descendants of tall ancestors have the tendency to "return", come back, to the normal average high in the population, known as the regression to the media. (Mr. Galton was an Eugenics supporter)

## Examples for "simple" linear regression
## The Least-Square linear regression

The general equation for the straight line is $y = mx + b_0$, this form is the "slope, intersection form". The slope is the rate of change the gives the change in $y$ for a unit change in $x$. Remember that the slope formula for two pair of points $(x_1, y_1)$ and $(x_2, y_2)$ is:
$$ m = \frac{(y_2 - y_1)}{(x_2 - x_1)}$$
$$ m = \frac{(y_2 - y_1)}{(x_2 - x_1)} $$
Using the expression for all the lines $y_i = \beta_{0_i} + \beta_{1_i}x_i$ for a set of points $(x_i, y_i)$, finding the minimum of the addition of all the differences with the ideal line we estimate the expression for the "best" slope $\hat{\beta_{1}}$ and the independent term $\hat{\beta_{0}}$:

The basic properties we know about one variable linear regression are:
The correlation measures the strength of the relationship between x and y (see this shiny app for an excellent visual overview of correlations).
The correlation ranges between -1 and 1.
The mean of x and y must fall on the line.
The slope of the line is defined as the change in $y$ over the change in $x$; $m= \frac{\Delta y}{\Delta x}$.
For regression use the ratio of the standard deviations such that the correlation is defined as $m=r\frac{s_y}{s_x}$ where $m$ is the slope, $r$ is the correlation and $\bar{x}$ and $\bar{y}$ the mean, and $s$ is the sample standard deviation.

$$\hat{\beta_{1}} = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i -\bar{y} )}{\sum_{i=1}^{n}(x_i - \bar{x})^2} $$

$$\hat{\beta_{0}} = \bar{y} - \hat{\beta_{1}} \bar x $$

## Example of linear regression

Expand Down Expand Up @@ -77,6 +89,7 @@ for a set of $x_{jl}$ predictor variables (or independent variables) defined as
$$ x_{jl} (l=1, . . . , L)$$
with $L(L<J)$, a general linear model with an error function $\epsilon_j$ can be expressed:
$$Y_j = x_{j1}\beta_1 + x_{j2}\beta_2 + x_{j3}\beta_3 + . . . + x_{jL}\beta_L + \epsilon_j$$

with $\epsilon_j$ an independent variable identically distributed to the Normal with mean equal to zero.
$$\epsilon_j \approx N(0,\sigma^2)_{iid}$$

Expand Down Expand Up @@ -104,7 +117,7 @@ There are models to regress several predictor variables to relate several random
$$y_i = E[y_i|x_i] + \epsilon_i$$
$$Y = \beta_0 + \beta_1 x_{1} + \beta_2 x_{2} + \dots + \beta_p x_{p}$$
Multiple linear regression model coefficients, the betas, give the change in $E[Y|x]$ for an increase of one unit on the predictor $x_j$ , holding other factors in the model constant; each of the estimates is adjusted for the effects of all the other predictors. As in the simple linear model the intercept $\beta_0$ (beta zero) gives the value $E[Y|x]$ when all the predictors are equal to zero. Example of multiple linear model estimate is done with: `glucose ~ exercise + age + drinkany + BMI`.
In general in R we can write:$Y = \beta_1 variable_1 + \beta_2 variable_2 + \beta_3 variable_3 + \beta_4 variable_4$ for a multiple linear model, in this case four regresors.
In general in R we can write:$Y = \beta_1 variable_1 + \beta_2 variable_2 + \beta_3 variable_3 + \beta_4 variable_4$ for a multiple linear model, in this case four regressors.

```{r}
hers_nodi_multFit <- lm(glucose ~ exercise + age + drinkany + BMI,
Expand Down
735 changes: 203 additions & 532 deletions LinearModel/LinearModel.html

Large diffs are not rendered by default.

0 comments on commit b2827ff

Please sign in to comment.