-
Notifications
You must be signed in to change notification settings - Fork 16
/
BruceCampbell_ST503_GroupDiscussion9_FarawayCh10_Pblm_6.Rmd
95 lines (76 loc) · 2.83 KB
/
BruceCampbell_ST503_GroupDiscussion9_FarawayCh10_Pblm_6.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
title: "NCSU ST 503 Discussion 8"
subtitle: "Probem 10.6 Faraway, Julian J. Linear Models with R CRC Press."
author: "Bruce Campbell"
fontsize: 12pt
output: pdf_document
---
---
```{r setup, include=FALSE,echo=FALSE}
rm(list = ls())
knitr::opts_chunk$set(echo = FALSE)
knitr::opts_chunk$set(dev = 'pdf')
knitr::opts_chunk$set(cache=TRUE)
knitr::opts_chunk$set(tidy=TRUE)
knitr::opts_chunk$set(prompt=FALSE)
knitr::opts_chunk$set(fig.height=5)
knitr::opts_chunk$set(fig.width=7)
knitr::opts_chunk$set(warning=FALSE)
knitr::opts_chunk$set(message=FALSE)
knitr::opts_knit$set(root.dir = ".")
library(latex2exp)
library(pander)
library(ggplot2)
library(GGally)
```
# 10.6 Model Selection with the hipcenter data.
Use the seatpos data with hipcenter as the response.
### (a) Fit a model with all eight predictors. Comment on the effect of leg length on the response.
```{r}
rm(list = ls())
data(seatpos, package="faraway")
df <-seatpos
numPredictors <- ( ncol(df)-1)
lm.fit <- lm(hipcenter ~ ., data=df)
summary(lm.fit)
```
We note that leg length is significant at a level of $\alpha=0.182$ and it has a negative association with the response.
(b) Compute a 95% prediction interval for the mean value of the predictors.
```{r}
x.model.matrix <- model.matrix(lm.fit)
x0.mean <- apply(x.model.matrix,2,mean)
pi<-predict(lm.fit,new=data.frame(t(x0.mean)),interval="prediction")
pi
pander(data.frame(pi.width=pi[3]-pi[2]))
```
(c) Use AIC to select a model. Now interpret the effect of leg length and compute the prediction interval. Compare the conclusions from the two models.
```{r}
library(leaps)
regsubsets.out <- regsubsets(hipcenter ~ .,data=seatpos,method = "exhaustive",nvmax=8)
rs <- summary(regsubsets.out)
rs$which
AIC <- 50*log(rs$rss/50) + (2:9)*2
plot(AIC ~ I(1:8), ylab="AIC", xlab="Number of Predictors")
plot(regsubsets.out, scale = "adjr2", main = "Adjusted R^2")
```
We see that $hipcenter \sim + age + ht + Leg$ is the model with the lowest AIC. We also plot the Adjusted $R^2$ of the models.
```{r}
lm.fit.subset <- lm(hipcenter ~ Age+Ht+Leg, data=df)
summary(lm.fit.subset)
```
Leg now has a p-value of $0.1099$
Let's calculate the condition number of the model matrix.
```{r}
x <- model.matrix(lm.fit.subset)[,-1]
e <- eigen(t(x) %*% x)
pander(data.frame(ev=t(e$val)),caption = "eigenvalues")
pander(data.frame(rcond =t(sqrt(e$val[1]/e$val))),caption="condition numbers")
```
```{r}
x.model.matrix <- model.matrix(lm.fit.subset)
x0.mean <- apply(x.model.matrix,2,mean)
pi<-predict(lm.fit.subset,new=data.frame(t(x0.mean)),interval="prediction")
pi
pander(data.frame(pi.width=pi[3]-pi[2]))
```
As expected our prediction interval has decresed in width. Ht is now significant at $0.07$ which is a dramatic change. We presume this is due to linear association among the predictors. We note that the predictions of the two models are similar.