-
Notifications
You must be signed in to change notification settings - Fork 0
/
lasso_Cox.Rmd
executable file
·169 lines (119 loc) · 4.69 KB
/
lasso_Cox.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
title: "lasso_cox"
author: "liuc"
date: "11/2/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## lasso cox 回归
> https://glmnet.stanford.edu/articles/Coxnet.html
lasso cox 回归主要是用于对cox回归中的系数进行筛选,并在此基础之上进一步建模分析。在对于变量间的多重共线性,以及变量多于样本的情况下,可以发挥较好的作用。
The glmnet package provides procedures for fitting the entire lasso or elastic-net regularization path for Cox models. The glmpath package implements a L1 regularised Cox proportional hazards model. An L1 and L2 penalised Cox models are available in penalized.
复现文献:https://bmccancer.biomedcentral.com/articles/10.1186/s12885-021-07916-3
```{r, include=FALSE}
library(ggplot2)
library(glmnet)
library(survival)
```
```{r}
lung <- na.omit(lung)
X <- model.matrix(~sex+age+ph.ecog+ph.karno+pat.karno+meal.cal+wt.loss,
data=lung)[,-1]
# 注意NA值要删除
y = Surv(lung$time, lung$status)
data(CoxExample)
X <- CoxExample$x
y <- CoxExample$y
```
常见的用法似乎是通过cv进行变量筛选后,再进行常规的cox回归。包括单变量Cox和多变量Cox。
```{r}
# cross-validation
# type.measure only supports "deviance" (also default) which gives the partial-likelihood, and "C", which gives the Harrell C index.
fitcv <- cv.glmnet(X,
y,
family="cox",
alpha=1,
type.measure = 'C',
nfolds=10
)
plot(fitcv)
plot(fitcv$glmnet.fit, xvar="lambda")
# 下面两个哪个做为最佳的lambda都是可以的
best_lambda <- fitcv$lambda.min # find optimal lambda value that minimizes test MSE,或者此处的c-index
fitcv$lambda.1se
# extract the coefficients at certain values of λ
coef(fitcv, s = 'lambda.min')
coef(fitcv, s=fitcv$lambda.min)[coef(fitcv, s=fitcv$lambda.min)!=0]
broom::tidy(fitcv)
# lambda 为;estimate为;
```
对于上图的解读:第一幅图,the left vertical line in our plot shows us where the CV-error curve hits its minimum.
The right vertical line shows us the most regularized model with CV-error within 1 standard deviation of the
minimum。两条线所对应的数值可以分别选择作为选择变量的不同阈值。Note that unlike most error measures, a higher C index means better prediction performance.
第二幅图,为随着lamda变化变量系数的变化。
从上述结果可以看到共有10个变量被保留了下来。
*ggplot2 DIY 上面两幅图:*
```{r}
cv.fit <- fitcv
tidied_cv = broom::tidy(cv.fit)
ggplot(tidied_cv, aes(lambda, estimate)) +
geom_line(color = "red") +
geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = .3) +
scale_x_sqrt() +
geom_vline(xintercept = cv.fit$lambda.min, lty=2) +
ylab('Partial Likelihood Deviance') +
# ylim(0, 200) +
ggthemes::theme_excel()
tidy_covariates = broom::tidy(cv.fit$glmnet.fit)
cols = colors()[!grepl("grey", colors()) & ! grepl("gray", colors())]
col_sam = sample(cols, length(unique(tidy_covariates$term)))
ggplot(tidy_covariates, aes(x=lambda, y=estimate, color=as.factor(term)))+
geom_line() +
guides(color='none') +
geom_vline(xintercept = cv.fit$lambda.min, lty=2) +
scale_x_sqrt()+
ggthemes::theme_excel()+
scale_color_manual(values=col_sam)+
ylab("Coefficients")
```
以下best_lambda似乎有问题。原来并非有问题,解释见下文。
```{r}
#find coefficients of best model
best_model <- glmnet(X, y,
family="cox",
alpha = 1, # alpha=1 is the lasso penalty, and alpha=0 the ridge penalty
lambda = best_lambda
)
coef(best_model)
best_model$beta
plot(best_model)
plot(best_model, xvar='lambda', label = TRUE)
```
meal.cal变量没有系数,是因为被lasso算法所drop了.
*interpret* 上述的Lasso-Cox回归,首先通过cv得到最佳的lamda参数;然后拟合模型,coef得到的系数中,NA为lasso删除掉的系数。
对于没有用best_lambda得到的lamda参数绘图,其结果如下,这种和上图best_model的鲜明对比是因为从sequence变成了单个数字
```{r}
fit <- glmnet(X, y,
alpha = 1,
family = 'cox'
)
plot(fit)
```
*Plotting survival curves: *
Class "coxnet" objects have a survfit method which allows the user to visualize the survival curves from the model.
```{r}
survival::survfit(best_model, s = best_lambda, x = X, y = y)
```
nomograph based on lasso-cox regression
具体的见*survival.Rmd*
文献中所用到的R包
```{r}
library(nricens)
library(stdca)
library(rmda) # for DCA
library(timeROC) # for ROC
```
```{r}
```