Answer-logreg1.Rmd

---
title: "Answers to exercises \"Binary data: Logistic regression\" part 1"
output:
  github_document:
    toc: true
    toc_depth: 2
---

# Exercise 1

## 1a
```{r}
adress <- "https://raw.github.com/janvdbroek/Generalized-Linear-Models/master/episode.txt"
lr1 <- read.table(adress,header=TRUE)
lr1$immune <- 1*(lr1$cd4<200)
epifit.1 <- glm(episode~immune,family=binomial,data = lr1)
summary(epifit.1)
```
## 1b
```{r}
epifit.1 <- glm(episode~log(followup),family=binomial,data = lr1)
summary(epifit.1)
```
## 1c
```{r}
epifit.2 <- glm(episode~offset(log(followup)),family=binomial,data = lr1)
summary(epifit.2)
```
The AIC of this model is 95.16 
The AIC of the model in which log(followup is an exposure is 94.22
So there is not much difference and then the model with the offset is preferred since it has fewer parameters.

## 1d

The model with only a comstant and an offset is: 

log(odds)=a+log(follow-up) or log(odds/follow-up)=a

so here the odds per unit time is modelled.

# Exercise 2

## 2a
```{r}
adress <- "https://raw.github.com/janvdbroek/Generalized-Linear-Models/master/lowbirth.dat"
lr2 <-read.table(adress,header=TRUE)
```

## 2b 
```{r}
#
fit.1 <- glm(low~age,family=binomial,data = lr2)
fit.2 <- glm(low~smoke,family=binomial,data = lr2)
fit.3 <- glm(low~ht,family=binomial,data = lr2)
fit.4 <- glm(low~1,family=binomial,data = lr2)
```

## 2c
log odds ratio if the independent variable is increased by 1

## 2 d+e
calculate the loglikelhood values one can use logLik(fit.1)

Model  |  AIC  |  estimate b | log-likelihhod
-------|-------|-------------|----------------
age    | 235.9 |   -0.05     | -115.96
smoke  | 233.8 |    0.70     | -114.90
ht     | 234.7 |    1.2      | -115.32
1      | 236.7 |             | -117.34

likelihood ratio for the models compared to model with constant (exp(l1-l0)):

comparison  |  LR
------------|-------
age vs 1    |  3.97
smoke vs 1  | 11.40
ht vs 1     |  7.47

interpretation: for instance the model with hypertension historyand a constantant makes the data 7.54 times as probable as amodel with only a constant.

## 2f
The model with smoke has lowest AIC although the model with ht is close.

# Exercise 3

## 3a
```{r}
adress <- "https://raw.github.com/janvdbroek/Generalized-Linear-Models/master/pdd.csv"
lr3 <- read.table(adress,header=TRUE,sep=",")
```

## 3b
```{r}
with(lr3,tapply(pdd,list(nop),sum))
with(lr3,tapply(n,list(nop),sum))
```

## 3c

Nop    | pdd=0  | pdd=1 | total
-------|--------|-------|------
nop=0  |  138   |  20   | 158
nop=1  |  105   |  45   | 150  
       |        |       |
total  |  243   |  65   | 308

## 3d
$OR=\frac{45 \cdot 138}{20 \cdot105}=2.957$
The pdd-odds in nop centers is about 3 times larger as in in birds not from such a center. There seems to be some logic in that since birds are brougth to an nop center if something is wrong with there health.

## 3e
```{r}
pdd.fit <- glm(cbind(pdd,n-pdd)~nop,family=binomial,data=lr3)
summary(pdd.fit)
exp(coef(pdd.fit))
```

# Exercise 4

## 4a
```{r}
adress <- "https://raw.github.com/janvdbroek/Generalized-Linear-Models/master/dalmatian.csv"
lr4 <- read.table(adress,header=TRUE,sep=",")
```
or with the rstudio menu.
## 4b
See the explanatipon in the text section 9

## 4c

```{r}
dalfit.1 <- glm(deaf~fhs,family=binomial,data=lr4)
summary(dalfit.1)
```
## 4d
```{r}
dalfit.0 <- glm(deaf~1,family=binomial,data=lr4)
summary(dalfit.0)
```
model | AIC  | log-lik |   lik. ratio
------|------|---------|------------------
  1   |1147.0| -572.5  |
fhs   |1085.2| -540.4  |exp(31.9)=7.145e+13