-
Notifications
You must be signed in to change notification settings - Fork 0
/
Common_statistical_tests_are_linear_models.qmd
189 lines (120 loc) · 4.54 KB
/
Common_statistical_tests_are_linear_models.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
title: "Common statistical tests are linear models"
format: html
---
## Common statistical tests are linear models
> https://lindeloev.github.io/tests-as-linear/
线性模型等价common statistic的一些笔记。
线性模型往往需要较多的样本数量才和common statistic的结果趋近。具体的样本数量在上述的链接中有极好的示例,此处不再赘述
### 对于correlation中常用的参数和非参数的Pearson和Spearman
对于correlation中常用的参数和非参数的Pearson和Spearman,其等价于如下的lm模型
```{r}
library(effectsize)
options(es.use_symbols = TRUE)
```
```{r}
# 生成测试数据
rnorm_fixed = function(N, mu = 0, sd = 1)
scale(rnorm(N)) * sd + mu
y = c(rnorm(15), exp(rnorm(15)), runif(20, min = -3, max = 0)) # Almost zero mean, not normal
x = rnorm_fixed(50, mu = 0, sd = 1) # Used in correlation where this is on x-axis
y2 = rnorm_fixed(50, mu = 0.5, sd = 1.5)
```
x, y 都是连续性数字变量
```{r}
a = cor.test(y, x, method = "pearson") # Built-in
b = lm(y ~ 1 + x) # Equivalent linear model: y = Beta0*1 + Beta1*x, P值R2虽一样但系数不一致
# 得到和a一样的系数
c = lm(scale(y) ~ 1 + scale(x)) # On scaled vars to recover r
summary(c) # 的系数既是R, F-statistic的P值既是P值。
```
```{r}
# Spearman correlation
a = cor.test(y, x, method = "spearman") # Built-in
b = lm(rank(y) ~ 1 + rank(x)) # Equivalent linear model
summary(b)
```
### t-test的线性等价:
#### one sample t-test
```{r}
# Built-in t-test
a = t.test(y)
# Equivalent linear model: intercept-only
b = lm(y ~ 1)
```
```{r}
## 以下三者是一致的,都是单样本wilcox检验
# Built-in
signed_rank = function(x) sign(x) * rank(abs(x))
a = wilcox.test(y)
# Equivalent linear model
b = lm(signed_rank(y) ~ 1) # See? Same model as above, just on signed ranks
# Bonus: of course also works for one-sample t-test
c = t.test(signed_rank(y))
a
summary(b)
c
```
#### two sample t-test
对于两样本ttest,其公式为:\[y_{i}=\beta_{0} + \beta_{1}xi \text{ } H_{0}:\beta_{1}=0 \]
```{r}
# Long format data with indicator
value = c(y, y2)
group = rep(c('y1', 'y2'), each = 50)
```
```{r}
# Built-in independent t-test on wide data
a = t.test(y, y2, var.equal = TRUE)
# Be explicit about the underlying linear model by hand-dummy-coding:
group_y2 = ifelse(group == 'y2', 1, 0) # 1 if group == y2, 0 otherwise
b = lm(value ~ 1 + group_y2) # Using our hand-made dummy regressor
# Note: We could also do the dummy-coding in the model
# specification itself. Same result as above.
# 甚至可以看到Intercept值和系数的值分别是样本的均值,一般线性模型的分类变量的处理方式。
tapply(value, group, mean)
c = lm(value ~ 1 + I(group == 'y2')) # summary(lm(value ~ group)) same result.
confint(c)
```
_Mann-Whitney U Test_
```{r}
# Wilcoxon / Mann-Whitney U
a = wilcox.test(y, y2)
# As linear model with our dummy-coded group_y2:
b = lm(rank(value) ~ 1 + group_y2) # compare to linear model above
```
在上述的示例中,t.test中假设方差相等,方差不相等的情况下we model one variance per group.
既是Welch ttest。
```{r}
# Built-in
a = t.test(y, y2, var.equal=FALSE)
# As linear model with per-group variances
b = nlme::gls(value ~ 1 + group_y2, weights = nlme::varIdent(form=~1|group),
method="ML")
summary(b) # 解读同线性模型
```
### ANOVAs
对于分组变量,在模型中纳入的是dummy变量,R中模型如何处理factor真是令人头痛呀。
好在很多函数中直接输入factor变量即可,虽然存在指定reference的问题。
```{r}
# Compare built-in and linear model
a = car::Anova(aov(value ~ group, D)) # Dedicated ANOVA function
b = lm(value ~ 1 + group_b + group_c, data = D) # As in-your-face linear model
#identical(lm(value ~ 1 + group, data = D), b)
```
```{r}
# Dedicated ANCOVA functions. The order of factors matter in pure-aov (type-I variance).
# Use type-II (default for car::Anova) or type-III (set type=3),
a = car::Anova(aov(value ~ group + age, D))
#a = aov(value ~ group + age, D) # Predictor order matters. Not nice!
# As dummy-coded linear model.
full = lm(value ~ 1 + group_b + group_c + age, D)
# Testing main effect of age using Likelihood-ratio test
null_age = lm(value ~ 1 + group_b + group_c, D) # Full without age. One-way ANOVA!
result_age = anova(null_age, full)
# Testing main effect of groupusing Likelihood-ratio test
null_group = lm(value ~ 1 + age, D) # Full without group. Pearson correlation!
result_group = anova(null_group, full)
```
### Proportion
```{r}
```