-
Notifications
You must be signed in to change notification settings - Fork 0
/
IC50_cal.qmd
651 lines (449 loc) · 20.2 KB
/
IC50_cal.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
---
title: "IC50 数据分析"
format: html
---
## IC50 数据分析
> https://rstudio-pubs-static.s3.amazonaws.com/378543_5b5bda32bf0541a485ccc49efbea680a.html
> https://towardsdatascience.com/drug-dose-response-data-analysis-5d7d336ad8e9
> https://www.graphpad.com/guides/prism/latest/curve-fitting/reg_models_built-in_to_prism.htm
> https://www.aatbio.com/tools/ic50-calculator 一个在线小工具
> https://rstats4ag.org/dose-response-curves.html 重要的基础知识内容,主要参考文件
> http://www.darrenkoppel.com/2020/09/04/dose-response-modelling-and-model-selection-in-r/
> https://demo.thunor.net/ 免费基于Python的dose-curve包
复现常见的药物IC50计算和绘图.
针对不同的实验设计,比如激动剂药物和拮抗剂药物,体外还是体内实验,严格来说体内实验是`dose-response curve`, 对于体外实验`concentration-response curve`是一个更为合适的称谓,不过一般宽泛的称为`dose-response`。
CDC官网对四参数最低的data points数目要求为a minimum of five data points is required for parameter estimation。不过 查到的为ELISA数据,此处也是仅供参考。
IC50 is the “half-maximal inhibitory concentration” for an entity (typically a drug) against a biological process or function (eg, enzyme activity, cell number, metabolic activity).
EC50 is, comparably, the “half-maximal effective concentration”. The effective is in there because not all entities will completely wipe out the biological process that you are assaying. 如果没有一个最大和最小值是不可能计算EC50的。
就只是response最下和最大值的中位值所对应的浓度。在拮抗剂可以让100%的细胞死亡的情况下,EC50=IC50。再比如说一个拮抗剂最多能杀死45%的细胞,那么IC50是不存在的,但是可以有EC50.
Kd
- E0: Response observed in absence of drug
- Einf: Maximum response caused by a drug
- EC50: Concentration of a drug at which 50% of its maximal response is induced. EC50 is normally measured in molar concentrations and is used as a measure of agonist drug potency — the lower the EC50 value, the lower the concentration of drug required to elicit a 50% maximal response and the greater the potency of the drug.
- IC50: Concentration of a drug required for 50% inhibition of cell growth. It is normally measured in molar concentrations and is used as a measure of antagonist drug potency — the lower the IC50 value, the more potent the drug is.
- Hill slope: 从定义可以看出拮抗剂的slope值一般是小于0的。而和1的对比可以大概看出曲线的坡度。
*four-parameter log-logistic model: * 四参数模型是较为常用的,除此之外还有三参数、五参数模型等。
在对数据进行绘图时,一般会对X轴的浓度数据取log,而Y轴的response一般进行简单的线性转换,主要是保证that all scatter around the curve is Gaussian。
*模型判断:*如何判断一个curve模型拟合的时候好呢?`drc`包似乎没有提供类似Prism R2的结果。而且在非线性模型中,R2不再是线性模型中x对
y的解释程度的那个数值,For nonlinear regression, the choice of a ‘null’ model is not as simple, and therefore the
R2 type measure will be different depending on the chosen reduced model.
R2在线性模型中实际上是两个模型的对比,所拟合的模型以及null model。
AIC是个不错的选择。
*模型选择:*除了依据模型评价指标进行选择外,依据数据特点进行选择。比如最为常用到的log-logistic模型,三参数、二参数是把四参数某个参数设置为默认值实现的,Prism中把Hill系数设为1,`drc`包三参数将最小值fit到0、两参数模型还将最大值设为1,两参数模型对于survival数据较为常用。
*symmetric log-logistic model: *
$$y=c+(d-c)/(1+exp(b(log(x)-log(ED50))))$$
*常见的拟合curve的曲线和函数:*
在生存分析的学习部分,曾见过多个分布,logistic分布、weibull分布、
在interpret曲线结果时,可以分别汇报四个参数的意义即可,比如对于b值,即是Hill系数,其是拐点值,则大约在拐点值时,细胞死亡速度达到高峰,
## 示例演示
```{r, include=FALSE}
library(tidyverse)
library(ggprism)
library(pzfx) # read GraphPad data
library(drc) # Dose response curve
library(sandwich)
library(lmtest)
# 解决drc的一个bug
# 在用sandwich包时的一个小错误
registerS3method("estfun", "drc", drc::estfun.drc)
registerS3method("bread", "drc", drc::bread.drc)
```
### 单曲线分析绘制
首先是和Prism软件数据的交互,由国人开发的这个包还是不错的。
```{r}
pzfx::pzfx_tables('~/OneDrive/kintor/Daily_Work/IC50_xiaodan/data.pzfx')
```
```{r}
dd <- pzfx::read_pzfx('~/OneDrive/kintor/Daily_Work/IC50_xiaodan/data.pzfx',
"22RV1-GT19686")
dd
```
此处读取的是经过处理的数据,数据为宽数据,不过需要计算重复的均值,作为函数的因变量。
```{r}
d_p <- dd %>% rename('drug_con' = 'Var.1','rep1' = '_1',
'rep2' = '_2', 'rep3' = '_3'
) %>%
mutate(id = row_number()) %>%
rowwise() %>%
mutate(inhib_mean = mean(c_across(c(rep1, rep2, rep3)))) %>%
ungroup() %>%
relocate(id, everything())
d_p
```
长数据类型。
```{r}
d_long <- d_p %>% pivot_longer(cols = -c(id, drug_con, inhib_mean),
names_to = 'rep',
values_to = 'response'
)
d_long
```
```{r}
ggplot(d_long, aes(x = drug_con, y = response)) +
geom_point()
plot(response ~ log(drug_con+0.1), data = d_long, main="Logarithmic Dose Scale")
```
`drc` package 用来拟合`dose-response curve`,`drm` (which fits the regression model) and `LL.4` (which defines the structure of the regression model).
`drc::drm`包有一个不错的地方在于,其对长宽数据都支持,不过*肯定用长数据*,所谓宽数据是需要计算均值的;而且后续绘图不友好,只是作为一个演示放在此笔记中。
```{r}
# fitted_curve <- drc::drm(
# formula = inhib_mean ~ drug_con,
# data = d_p,
# fct = LL.4(names = c("hill", "min_value", "max_value", "ec_50"))
# # fct = LL.3()
# )
fitted_curve <- drc::drm(
formula = response ~ drug_con,
data = d_long,
fct = LL.4(names = c("hill", "min_value", "max_value", "ec_50")) # 四参数
# fct = drc::LL.4(fixed = c(1,NA,NA,NA)) # 三参数
)
summary(fitted_curve)
```
*interpret: *As a consequence estimated b values often lie in the range 0.5 to 20 in absolute value, regardless of the assay or experiment generating the data.
可以看到,再手动计算均值后得到的结果和直接用长数据得到的结果是一致的。
```{r}
fitted_curve2 <- drc::drm(
formula = response ~ drug_con,
data = d_long,
# fct = LL.4(names = c("hill", "min_value", "max_value", "ec_50")) # 四参数
fct = drc::LL.4(fixed = c(1,NA,NA,NA)) # 三参数
)
summary(fitted_curve2)
# 此处所得到的结果和summary结果差异在于其计算方法的不同
lmtest::coeftest(fitted_curve2, vcov = sandwich)
# summary(multcomp::glht(ryegrass.LL.4)) # same result as summary
# 三参数
broom::tidy(fitted_curve2) %>%
bind_cols(tibble(rowname = c('E0', 'Einf', 'EC50'))) %>%
dplyr::relocate(rowname, everything()) %>%
bind_cols(confint(fitted_curve2))
```
*模型验证*
以下模型的lack-of-fit 的P值偏大,模型的整体表现似乎一般。
```{r}
# 和ANOVA模型的差异,P值是越大越好呀
modelFit(fitted_curve)
modelFit(fitted_curve2)
# the correlation of the adjuste of the curve plotted and the data
cor(d_long$response, fitted(fitted_curve2))
```
对于连续变量结局模型的评价,lack-of-fit的P值大于0.05表明模型还不错,不过Prism有提供R2值,目前`drc`包还没有找到计算R2的方法。for lack-of-fit test (against a one-way ANOVA model): the larger (the p-value) the better.
_但是只利用lack-of-fit_还是不够的,The distribution of the residuals and the hopefully normal distribution of residuals are informative to make sure that the prerequisites for doing a regression in the first place are not violated. The assumptions we must consider include:
Correct regression model
Variance Homogeneity
Normally distributed measurement errors
Mutually independent measurement error ε
```{r}
#Graphical analysis of residuals
op <- par(mfrow = c(1, 2), mar=c(3.2,3.2,2,.5), mgp=c(2,.7,0)) #put two graphs together
plot(residuals(fitted_curve2) ~ fitted(fitted_curve2), main="Residuals vs Fitted")
abline(h=0)
qqnorm(residuals(fitted_curve2))
qqline(residuals(fitted_curve2))
```
*interpret: *Model esiduals diagnostic plots: Left: distribution of residuals against the fitted value (variance homogeneity); Right: normal q-q plot illustrating whether the residuals are normally distributed (normally distributed measurement errors).
The residual plot indicates that the model was able to catch the variation in data. The distribution of residuals should look like a shotgun shot on a wall. Apparently, there are some systematic violations in the residual plot, At small fitted values the residuals are close to zero than at high fitted values the variation is much more pronounced. This is called a funnel type pattern. We also see it in the graphs with the fit and all the observations (Figure 11.4). The normal Q-Q plot looks all right, no serious deviation from the assumption of normal distribution of residuals. Later we will look at this problem, which is a statistical problem not a biological one.
*计算ED值:*
考虑到各种模型间的不同,用此函数得到ED值更为靠谱一些。*weibull等模型的ED50值需通过此函数得到!*
Delta refers to the asymptotics-based method to calculate confident intervals.
```{r}
ED(fitted_curve2, c(5, 10, 50, 80), interval = "delta")
```
*计算IC50值:*
此处就有一个较为微妙的地方,Prism计算得到的值实验室一般直接采用,即EC50值,而对于不同的模型、不同的实验设计,似乎
IC50的真正计算需要进一步的探讨。比如Weibull其EC50值就不是一个模型参数。
注意Hill系数的值在抑制剂中,应该为一个负值。
```{r}
coefs <- setNames(
fitted_curve$coefficients,
c("hill", "min_value", "max_value", "ec_50")
)
ic_50 <- with(
as.list(coefs),
exp(
log(ec_50) + (1 / hill) * log(max_value / (max_value - 2 * min_value))
)
)
ic_50
```
*选择合适的model*
我们所使用的数据是和control组进行对比所得到的比值,是具有浓度为0的数据点,应该在计算IC50时带入,不过并没有必要对此要设置最大最小值的默认值,自己去计算E0,Einf是完全可行的。依据不同的实验设计选择不同的参数。
*因为数据为比例,有理由考虑模型最大设置为100等*
```{r, warning=FALSE}
# 演示一个模型选择的过程
# 依据IC值的大小可以明显的看到4参数是个好模型。
drc::mselect(fitted_curve, fctList = list(W1.3(fixed=c(NA, 100, NA)),
W1.4(),
W2.4(),
W2.3(fixed=c(NA, 100, NA)),
LL.3(),
LL.4(fixed = c(1,NA,NA,NA)),
LL.4(fixed = c(NA,0,100,NA))
),
linreg=TRUE
)
```
*interpret: *For Akaike’s information criterion and the residual standard error: the smaller the better and for lack-of-fit test (against a one-way ANOVA model): the larger (the p-value) the better. Note that the residual standard error is only available for continuous dose-response data. Log likelihood values cannot be used for comparison unless the models are nested.
The rule of AIC is that the smaller the better and differences of *more than 10* indicates “significantly” better model fits.
```{r}
AIC(fitted_curve, fitted_curve2)
```
*考虑到上述模型中4参数模型其log-of-fit*也小于0.05,再看待一下Weibull模型。
```{r}
fit_weibull <- drc::drm(
formula = response ~ drug_con,
data = d_long,
fct = W1.4()
)
modelFit(fit_weibull)
```
```{r}
# ?plot.drc
plot(fit_weibull)
```
*drc包还有提供综合几个模型结果的函数*
就个人而言,期望其的使用,在于多个模型的AIC差别不大的情况下,虽则其会给予比重。
```{r}
drc::maED(fitted_curve,
fctList = list(LL.3(), LL.4(fixed = c(1,NA,NA,NA))),
c(10, 50, 80),
interval="kang"
)
```
#### 利用ggplot2绘制漂亮的图片:
下面这两段摘抄的代码效果一般,不考虑。
```{r}
# new dose levels as support for the line
newdata <- expand.grid(conc = exp(seq(log(0.01), log(100), length = 100)))
# newdata <- expand(conc = )
# predictions and confidence intervals
pm <- predict(fitted_curve, newdata = newdata, interval = "confidence")
# new data with predictions
newdata$p <- pm[, 1]
newdata$pmin <- pm[, 2]
newdata$pmax <- pm[, 3]
# plot curve
library(ggplot2)
# need to shift conc == 0 a bit up, otherwise there are problems with coord_trans
d_long$conc0 <- d_long$drug_con
# d_p$conc0 <- log(d_p$drug_con)
d_long$conc0[d_long$conc0 == 0] <- 0.001
# plotting the curve
ggplot(d_long, aes(x = conc0, y = response)) +
geom_point(size =3) +
geom_ribbon(data = newdata, aes(x = conc, y = p, ymin = pmin, ymax = pmax), alpha = 0.2) +
geom_line(data = newdata, aes(x = conc, y = p)) +
scale_x_log10() +
xlab("[Agonist], M") +
ylab("Response") +
scale_colour_manual(
values = c("#9FA3FE", "#00167B", "#9FA3FE")
) +
ggnewscale::new_scale_colour() +
scale_colour_prism(
palette = "winter_bright",
) +
scale_shape_prism() +
theme_prism(palette = "winter_bright", base_size = 16) +
theme(
axis.title.y = element_blank(),
legend.title = element_blank(),
legend.position = c(0.05, 0.95),
legend.justification = c(0.05, 0.95)
)
```
```{r}
# Note the logged x-axis
plot(fitted_curve,
log='x',
xlab = 'Drug concentration (microM)',
ylab= 'Inhibition',
type = 'all'
)
```
此为另一篇推文计算的IC50和上述有所不同。
```{r}
E0 <- fitted_curve$coefficients[2]
Einf <- fitted_curve$coefficients[3]
EC50 <- fitted_curve$coefficients[4]
H <- -fitted_curve$coefficients[1]
IC50 <- exp(log((((Einf - E0) / (50 - E0) ) - 1) * EC50**(-H))/ (-H))
efficacy_metrics <- data.frame(c(E0, Einf, EC50, H, IC50))
rownames(efficacy_metrics) <- c('E0', 'Einf', 'EC50', 'H', 'IC50')
colnames(efficacy_metrics) <- 'Value'
efficacy_metrics
```
```{r}
# 不是很合适
dose_resp <- y ~ min + ((max - min) / (1 + exp(hill_coefficient * (ec50 - x))))
d_p %>%
mutate(drug_con_log = log10(drug_con)) %>%
pivot_longer(
cols = -c(drug_con, drug_con_log, id, inhib_mean),
names_pattern = "(.{3})([0-9])",
names_to = c("treatment", "rep"),
values_to = "response"
) %>%
ggplot(
aes(x = drug_con_log, y = response)
) +
geom_smooth(
#aes(colour = treatment),
method = "nls", formula = dose_resp, se = FALSE,
method.args = list(start = list(
min = E0, max = Einf,
ec50 = EC50, hill_coefficient = H
))
) +
scale_colour_manual(
labels = c("Inhibitor"),
values = c("#00167B", "#9FA3FE")
) +
ggnewscale::new_scale_colour() +
geom_point(size = 3) +
scale_colour_prism(
palette = "winter_bright",
labels = c(
"Inhibitor"
)
) +
scale_shape_prism(
labels = c(
"Inhibitor"
)
) +
theme_prism(palette = "winter_bright", base_size = 16) +
theme(
axis.title.y = element_blank(),
legend.title = element_blank(),
legend.position = c(0.05, 0.95),
legend.justification = c(0.05, 0.95)
) +
labs(x = "[Agonist], M")
```
*绘制基于`ggplot2`的漂亮图片:*
```{r}
#plot_2
p1 <- ggplot(data = d_long,
aes(x = drug_con, y = response)) +
geom_point(col = "yellowgreen", size = 4) +
geom_smooth(method = drm,col = "skyblue", method.args = list(fct = L.4()), se = F) +
scale_x_log10() +
labs(title= "GT19637",
x = "Dose (mM)", y = "Inhibition") +
# ggprism::theme_prism() +
theme(legend.position = "",
panel.background = element_blank(),
panel.border = element_rect(colour = "black",fill = NA, size=0.5),
panel.grid = element_blank(),
axis.text.x = element_text(size= 12, color = "darkblue",family = "sans",hjust = 0.5),
axis.text.y = element_text(size= 12, color = "darkblue",family = "sans",vjust = 0.5,hjust = 0.5),
axis.title = element_text(size=12, color = "darkred",family = "sans"),
axis.ticks = element_line(size= 0.5),
axis.ticks.length = unit(3, "pt")
)
p1
```
绘制基于`ggplot2`的漂亮图片:
```{r}
data_1 <- aggregate(d_long["response"], list(conc = d_long$drug_con), FUN = mean)
data_1$sd <- aggregate(d_long["response"], list(conc = d_long$drug_con), FUN = sd)$response
ic50_value <- round(efficacy_metrics["EC50", ], 3)
p2 <- ggplot(
data = data_1,
aes(x = conc, y = response)
) +
geom_point(col = "red", size = 3, shape = 21, fill = "yellowgreen", position = "identity") +
geom_errorbar(aes(ymin = response - sd, ymax = response + sd), width = .05, position = "identity") +
geom_smooth(method = drm, col = "skyblue", method.args = list(fct = L.4()), se = F) +
scale_x_log10() +
ggplot2::annotate(
geom = "text", x = 0.2, y = 0.2, size = 8,
label = glue::glue("IC50 = {ic50_value}")
) +
labs(
title = "GT19637",
x = "Log Drug Concentration(nM)", y = "Relative Cell viability"
) +
theme(
legend.position = "",
panel.background = element_blank(),
# panel.border = element_rect(colour = "black", fill = NA, size = 0.5),
panel.grid = element_blank(),
axis.text.x = element_text(size = 12, color = "darkblue", family = "sans", hjust = 0.5),
axis.text.y = element_text(size = 12, color = "darkblue", family = "sans", vjust = 0.5, hjust = 0.5),
axis.title = element_text(size = 16, color = "darkred", family = "sans"),
axis.ticks = element_line(size = 1),
axis.line = element_line(size = 1),
axis.ticks.length = unit(3, "pt"),
plot.title = element_text(size = 20, colour = "black", hjust = 0.5, vjust = -1, face = "bold")
)
p2
```
### 多曲线对比
除计算单个化合物的IC50值外,两条或多条曲线间的对比也是经常要用到的功能,现记录一下。
两条曲线的对比,通常是满足一定条件的,即是可比的,比如同样实验设计的不同化合物之类的。
首先绘制两条或多条不同的曲线。
```{r}
plot(fitted_curve,
log='x',
xlab = 'Drug concentration (microM)',
ylab= 'Inhibition',
type = 'all',
broken = TRUE
)
plot(fitted_curve2, add = TRUE, col="orange")
```
```{r}
pp1 <- ggplot(data = d_long,
aes(x = drug_con, y = response)) +
geom_point(col = "yellowgreen", size = 4) +
geom_smooth(method = drm,col = "skyblue", method.args = list(fct = L.4()), se = F) +
geom_smooth(method = drm,col = "red4",
method.args = list(fct = L.4(fixed = c(1,NA,NA,NA))),
se = F) +
scale_x_log10() +
labs(title= "GT19637",
x = "Dose (mM)", y = "Inhibition") +
ggprism::theme_prism()
pp1
```
*多曲线绘制示例:*
输入数据中加上对应的分组信息即可。
当然需要选择特定的对比。
```{r}
fit_two <- drc::drm(formula = DryMatter ~ Dose,
curveid = Herbicide,
data = S.alba,
fct = LL.4()
)
```
*在多条曲线进行拟合时有什么是需要注意到嚒*
示例中考虑了设置一样的最大最小值。
```{r}
summary(fit_two)
```
```{r}
modelFit(fit_two)
```
```{r}
plot(fit_two, broken = T, col = c('red', 'blue'), )
```
`EDcomp` Comparison of relative potencies between dose-response curves
estimate值是 relative potency
```{r}
EDcomp(fit_two, percVec = c(50, 50))
```
```{r}
EDcomp(fit_two, percVec = c(50, 50), interval="delta", reverse=TRUE)
```
```{r}
par(mfrow=c(1,1), mar=c(3.2,3.2,.5,.5), mgp=c(2,.7,0))
drc::relpot(fit_two, interval = "delta", bty="l")
```
## 拟合log-logistic模型
log-logsitic 是指数据经过log后具有logistic分布的数据。
```{r}
library(glogis)
```