-
Notifications
You must be signed in to change notification settings - Fork 1
/
Answer-logreg1.Rmd
144 lines (116 loc) · 3.47 KB
/
Answer-logreg1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
title: "Answers to exercises \"Binary data: Logistic regression\" part 1"
output:
github_document:
toc: true
toc_depth: 2
---
# Exercise 1
## 1a
```{r}
adress <- "https://raw.github.com/janvdbroek/Generalized-Linear-Models/master/episode.txt"
lr1 <- read.table(adress,header=TRUE)
lr1$immune <- 1*(lr1$cd4<200)
epifit.1 <- glm(episode~immune,family=binomial,data = lr1)
summary(epifit.1)
```
## 1b
```{r}
epifit.1 <- glm(episode~log(followup),family=binomial,data = lr1)
summary(epifit.1)
```
## 1c
```{r}
epifit.2 <- glm(episode~offset(log(followup)),family=binomial,data = lr1)
summary(epifit.2)
```
The AIC of this model is 95.16
The AIC of the model in which log(followup is an exposure is 94.22
So there is not much difference and then the model with the offset is preferred since it has fewer parameters.
## 1d
The model with only a comstant and an offset is:
log(odds)=a+log(follow-up) or log(odds/follow-up)=a
so here the odds per unit time is modelled.
# Exercise 2
## 2a
```{r}
adress <- "https://raw.github.com/janvdbroek/Generalized-Linear-Models/master/lowbirth.dat"
lr2 <-read.table(adress,header=TRUE)
```
## 2b
```{r}
#
fit.1 <- glm(low~age,family=binomial,data = lr2)
fit.2 <- glm(low~smoke,family=binomial,data = lr2)
fit.3 <- glm(low~ht,family=binomial,data = lr2)
fit.4 <- glm(low~1,family=binomial,data = lr2)
```
## 2c
log odds ratio if the independent variable is increased by 1
## 2 d+e
calculate the loglikelhood values one can use logLik(fit.1)
Model | AIC | estimate b | log-likelihhod
-------|-------|-------------|----------------
age | 235.9 | -0.05 | -115.96
smoke | 233.8 | 0.70 | -114.90
ht | 234.7 | 1.2 | -115.32
1 | 236.7 | | -117.34
likelihood ratio for the models compared to model with constant (exp(l1-l0)):
comparison | LR
------------|-------
age vs 1 | 3.97
smoke vs 1 | 11.40
ht vs 1 | 7.47
interpretation: for instance the model with hypertension historyand a constantant makes the data 7.54 times as probable as amodel with only a constant.
## 2f
The model with smoke has lowest AIC although the model with ht is close.
# Exercise 3
## 3a
```{r}
adress <- "https://raw.github.com/janvdbroek/Generalized-Linear-Models/master/pdd.csv"
lr3 <- read.table(adress,header=TRUE,sep=",")
```
## 3b
```{r}
with(lr3,tapply(pdd,list(nop),sum))
with(lr3,tapply(n,list(nop),sum))
```
## 3c
Nop | pdd=0 | pdd=1 | total
-------|--------|-------|------
nop=0 | 138 | 20 | 158
nop=1 | 105 | 45 | 150
| | |
total | 243 | 65 | 308
## 3d
$OR=\frac{45 \cdot 138}{20 \cdot105}=2.957$
The pdd-odds in nop centers is about 3 times larger as in in birds not from such a center. There seems to be some logic in that since birds are brougth to an nop center if something is wrong with there health.
## 3e
```{r}
pdd.fit <- glm(cbind(pdd,n-pdd)~nop,family=binomial,data=lr3)
summary(pdd.fit)
exp(coef(pdd.fit))
```
# Exercise 4
## 4a
```{r}
adress <- "https://raw.github.com/janvdbroek/Generalized-Linear-Models/master/dalmatian.csv"
lr4 <- read.table(adress,header=TRUE,sep=",")
```
or with the rstudio menu.
## 4b
See the explanatipon in the text section 9
## 4c
```{r}
dalfit.1 <- glm(deaf~fhs,family=binomial,data=lr4)
summary(dalfit.1)
```
## 4d
```{r}
dalfit.0 <- glm(deaf~1,family=binomial,data=lr4)
summary(dalfit.0)
```
model | AIC | log-lik | lik. ratio
------|------|---------|------------------
1 |1147.0| -572.5 |
fhs |1085.2| -540.4 |exp(31.9)=7.145e+13