nlme.Rmd

---
title: "非线性混合效应模型"
author: "liuc"
date: "1/11/2022"
output: word_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## 非线性混合效应模型

> http://sia.webpopix.org/nlme.html


非线性混合模型（nonlinear mixed effect model）是非线性模型的扩展。非线性模型我们是熟悉的，包括常见的多项式回归和GAM 等模型。非线性混合模型是指一个或多个coefficients出现非线性的情况。在处理重复测量数据（Longitudinal data）时，亦可扩展到此方法 的应用。在进行学习的过程中所收集到的资料多是以PK数据为例进行讲述的。


广义混合效应模型是广义线性模型的扩展。They also inherit from GLMs the idea of extending linear mixed models to non-normal data.


```{r, include=FALSE}
# 首先是找到好用的R包

library(tidyverse)
library(nlme)
library(lme4)
library(saemix) # 提供一种nlme的算法
library(ggPMX)
library(minpack.lm)

# for nonlinear mixed effects models
library(nlmixr2)
library(rxode2)
```


### nlme包的线性混合模型

```{r}
# 线性混合模型，visual0作为协变量
data(armd, package="nlmeU")

lme1 <- lme(
  visual ~ visual0 + time + treat.f,
  random = ~ 1 | subject,
  data = armd)

summary(lme1)
anova(lme1)

lme2 <- lme(
  visual ~ visual0 + time + treat.f + treat.f:time,
  random = ~ 1 | subject,
  data = armd)


# 可以用anova()比较两个嵌套模型， 为了比较随机效应相同、固定效应不同的模型， 需要两个模型都使用最大似然估计
lme1.ml <- update(lme1, method="ML")
lme2.ml <- update(lme2, method="ML")
anova(lme1.ml, lme2.ml)
```


### 开始非线性混合模型

> http://sia.webpopix.org/nlme.html

首先似乎其在药代动力学中应用较多，遂采用一个药代动力学的数据集。本例子使用`saemix`这一nlmme算法的实现形式。


```{r}
# Find all rows where "DV"==NA and replace the 0 in "MDV" with 1
# Find all rows where "DV"==NA and replace the 0 in "AMT" with the "DOSE" of that row
data(Theoph, package ='datasets')


# The 12 subjects received 320 mg of theophylline at time 0.
theophylline <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 
11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
12L, 12L), DOSE = c(4.02, 4.02, 4.02, 4.02, 4.02, 4.02, 4.02, 
4.02, 4.02, 4.02, 4.02, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 
4.4, 4.4, 4.4, 4.53, 4.53, 4.53, 4.53, 4.53, 4.53, 4.53, 4.53, 
4.53, 4.53, 4.53, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 
4.4, 4.4, 5.86, 5.86, 5.86, 5.86, 5.86, 5.86, 5.86, 5.86, 5.86, 
5.86, 5.86, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4.95, 4.95, 4.95, 
4.95, 4.95, 4.95, 4.95, 4.95, 4.95, 4.95, 4.95, 4.53, 4.53, 4.53, 
4.53, 4.53, 4.53, 4.53, 4.53, 4.53, 4.53, 4.53, 3.1, 3.1, 3.1, 
3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 5.5, 5.5, 5.5, 5.5, 5.5, 
5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 4.92, 4.92, 4.92, 4.92, 4.92, 4.92, 
4.92, 4.92, 4.92, 4.92, 4.92, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 
5.3, 5.3, 5.3, 5.3), TIME = c(0, 0.25, 0.57, 1.12, 2.02, 3.82, 
5.1, 7.03, 9.05, 12.12, 24.37, 0, 0.27, 0.52, 1, 1.92, 3.5, 5.02, 
7.03, 9, 12, 24.3, 0, 0.27, 0.58, 1.02, 2.02, 3.62, 5.08, 7.07, 
9, 12.15, 24.17, 0, 0.35, 0.6, 1.07, 2.13, 3.5, 5.02, 7.02, 9.02, 
11.98, 24.65, 0, 0.3, 0.52, 1, 2.02, 3.5, 5.02, 7.02, 9.1, 12, 
24.35, 0, 0.27, 0.58, 1.15, 2.03, 3.57, 5, 7, 9.22, 12.1, 23.85, 
0, 0.25, 0.5, 1.02, 2.02, 3.48, 5, 6.98, 9, 12.05, 24.22, 0, 
0.25, 0.52, 0.98, 2.02, 3.53, 5.05, 7.15, 9.07, 12.1, 24.12, 
0, 0.3, 0.63, 1.05, 2.02, 3.53, 5.02, 7.17, 8.8, 11.6, 24.43, 
0, 0.37, 0.77, 1.02, 2.05, 3.55, 5.05, 7.08, 9.38, 12.1, 23.7, 
0, 0.25, 0.5, 0.98, 1.98, 3.6, 5.02, 7.03, 9.03, 12.12, 24.08, 
0, 0.25, 0.5, 1, 2, 3.52, 5.07, 7.07, 9.03, 12.05, 24.15), DV = c(NA, 
2.84, 6.57, 10.5, 9.66, 8.58, 8.36, 7.47, 6.89, 5.94, 3.28, NA, 
1.72, 7.91, 8.31, 8.33, 6.85, 6.08, 5.4, 4.55, 3.01, 0.9, NA, 
4.4, 6.9, 8.2, 7.8, 7.5, 6.2, 5.3, 4.9, 3.7, 1.05, NA, 1.89, 
4.6, 8.6, 8.38, 7.54, 6.88, 5.78, 5.33, 4.19, 1.15, NA, 2.02, 
5.63, 11.4, 9.33, 8.74, 7.56, 7.09, 5.9, 4.37, 1.57, NA, 1.29, 
3.08, 6.44, 6.32, 5.53, 4.94, 4.02, 3.46, 2.78, 0.92, NA, 0.85, 
2.35, 5.02, 6.58, 7.09, 6.66, 5.25, 4.39, 3.53, 1.15, NA, 3.05, 
3.05, 7.31, 7.56, 6.59, 5.88, 4.73, 4.57, 3, 1.25, NA, 7.37, 
9.03, 7.14, 6.33, 5.66, 5.67, 4.24, 4.11, 3.16, 1.12, NA, 2.89, 
5.22, 6.41, 7.83, 10.21, 9.18, 8.02, 7.14, 5.68, 2.42, NA, 4.86, 
7.24, 8, 6.81, 5.87, 5.22, 4.45, 3.62, 2.69, 0.86, NA, 1.25, 
3.96, 7.82, 9.72, 9.75, 8.57, 6.59, 6.11, 4.57, 1.17), WEIGHT = c(79.6, 
79.6, 79.6, 79.6, 79.6, 79.6, 79.6, 79.6, 79.6, 79.6, 79.6, 72.4, 
72.4, 72.4, 72.4, 72.4, 72.4, 72.4, 72.4, 72.4, 72.4, 72.4, 70.5, 
70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 72.7, 
72.7, 72.7, 72.7, 72.7, 72.7, 72.7, 72.7, 72.7, 72.7, 72.7, 54.6, 
54.6, 54.6, 54.6, 54.6, 54.6, 54.6, 54.6, 54.6, 54.6, 54.6, 80, 
80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 64.6, 64.6, 64.6, 64.6, 
64.6, 64.6, 64.6, 64.6, 64.6, 64.6, 64.6, 70.5, 70.5, 70.5, 70.5, 
70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 86.4, 86.4, 86.4, 86.4, 
86.4, 86.4, 86.4, 86.4, 86.4, 86.4, 86.4, 58.2, 58.2, 58.2, 58.2, 
58.2, 58.2, 58.2, 58.2, 58.2, 58.2, 58.2, 65, 65, 65, 65, 65, 
65, 65, 65, 65, 65, 65, 60.5, 60.5, 60.5, 60.5, 60.5, 60.5, 60.5, 
60.5, 60.5, 60.5, 60.5), MDV = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), AMT = c(4.02, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4.4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 4.53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4.4, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 5.86, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 4.95, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4.53, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 3.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
5.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4.92, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 5.3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("ID", 
"DOSE", "TIME", "DV", "WEIGHT", "MDV", "AMT"), row.names = c(NA, 
-132L), class = "data.frame")
```


#### use saemix for nlme

> https://jchiquet.github.io/MAP566/docs/mixed-models/map566-lecture-nonlinear-mixed-model.html

`saemix`的语法和`nlme`, `lme4`包的语法差别较大，其更像是结构方程模型的语法。

_concentration_ is the response variable y, _time_ is the explanatory variable (or predictor) t and _id_ is the grouping variable

此包在当前目录下所生成的newdir目下有每个样本随时间变化曲线图和所有样本随时间变化的曲线。

```{r}
# 创建data
saemix_data <- saemixData(name.data       = Theoph,
                          name.group      = "Subject",
                          name.predictors = "Time", # predictor
                          name.response   = "conc" # response
                          )

# structural model
# 先吸收后排出的过程
model1_nlme <- function(psi,id,x) {
  D   <- 320
  t   <- x[,1]
  ka  <- psi[id,1]
  V   <- psi[id,2]
  ke  <- psi[id,3]
  fpred <- D*ka/(V*(ka-ke))*(exp(-ke*t)-exp(-ka*t))
  fpred
}

# structural model和initial值的选择和设定应该是PK模型的重点
# psi0: initial estimates of the fixed parameters 
saemix_model <- saemixModel(model = model1_nlme,
                            psi0  = c(ka=1,V=20,ke=0.5),
                            modeltype = "structural",
                            error.model = 'constant'
                            )


saemix_options <- list(map=TRUE, fim=TRUE, ll.is=FALSE, 
                       displayProgress=FALSE, save=FALSE, seed=632545
                       )

# fit model
saemix_fit1 <- saemix(saemix_model, saemix_data, saemix_options)

saemix_fit1@results
```

*interpret the results: *


`psi`函数所返回的是，每一个样本的参数：
structural model里的参数，

```{r}
# The individual parameters estimates are also available
psi <- psi(saemix_fit1)
psi

```


```{r}

# These individual parameter estimates can be used for computing and plotting individual predictions
saemix_fit <- saemix.predict(saemix_fit1)
saemix.plot.fits(saemix_fit1)

```


模型诊断：

the observations versus individual predictions:

结果最好能在一条对角线。

```{r}
saemix.plot.obsvspred(saemix_fit1,level=1)
```

the residuals versus time, and versus individual predictions，

残差的分布最好在置信区间内。

```{r}
saemix.plot.scatterresiduals(saemix_fit1, level=1)
```


#### use lme4 for nlme

首先查看数据的focus变量（感兴趣的一半是因变量）随不同时期采样的变化。
the concentration first increases during the absorption phase adn then decreases during the elimination phase.
每一个个体都呈现PK模型的数据特点，而混合模型可以考虑inter individual variability，既是population approach

```{r}
theme_set(theme_bw())

pl <- ggplot(data=theophylline, aes(x=TIME, y=DV)) + 
  geom_point(color="#993399", size=2) +
  xlab("time (h)") + ylab("concentration (mg/l)")

pl +geom_line(color="#993399", aes(group=ID))
```

Fitting a nonlinear model to a single subject,
对个体拟合一个非线性模型。

```{r}
theo1 <- theophylline[theophylline$ID==1 ,c("TIME","DV")]

head(theo1) # 一个患者在一个dose下不同时间点的数据
```

```{r}
pk.model1 <- function(psi, t) {
  D <- 320
  ka <- psi[1]
  V <- psi[2]
  ke <- psi[3]
  f <- D * ka / V / (ka - ke) * (exp(-ke * t) - exp(-ka * t))
  return(f)
}

pkm1 <- nlme::nls(DV ~ pk.model1(psi, TIME), start = list(psi = c(ka = 1, V = 40, ke = 0.1)), 
            data = theo1
            )
coef(pkm1)
```


`pkm.all`模型因为没有考虑样本间的random effect，虽则是拟合了12个样本的数据。就像直接用lm拟合lme的数据一样。进一步的 我们将采用`saemix`这一R包来拟合一个考虑到观察间不独立，通过subject的random effect条件性得到的nlme模型，也就是我们所 目标构建的模型。

```{r}
pkm.all <- nlme::nls(concentration ~ pk.model1(psi, time), 
                     start=list(psi=c(ka=1, V=40, ke=0.1)), data=theo.data)
coef(pkm.all)
```


Fitting a unique nonlinear model to several subjects：

```{r}
pl1 <- ggplot(data=theo1, aes(x=time,y=concentration)) + geom_point( color="#993399", size=3) + 
  xlab("time (h)") + ylab("concentration (mg/l)") + ylim(c(0,11))
pl1 + geom_line(color="#993399")
```


```{r}
new.df$pred.all <- predict(pkm.all, newdata=new.df)
pl +  geom_line(data=new.df, aes(x=time,y=pred.all), colour="#339900", size=1)
```

*lme4*包也可以拟合非线性混合效应模型。

> https://lme4.r-forge.r-project.org/slides/2011-01-11-Madison/6NLMMH.pdf

In an NLMM all of the fixed-effects parameters and all of the random effects are with respect to the nonlinear model parameters, which are lKe, lKa and lCl in this case. In the lme4 package, the default fixed-effects expression is 0 + lKe + lKa + lCl, indicating that the intercept term is suppressed and that there is a single fixed effect for each
nonlinear model parameter. Random effects must also be specified with respect to the nonlinear model parameters.
In the lme4a version terms look like (O + lKe|Subject).

```{r}
#  It is helpful to use the deriv() function to symbolically differentiate the model function.
# ?SSfol
example(SSfol)
```


```{r}
Th.start <- c( lKe = -2.5 , lKa = 0.5 , lCl = -3)

nm1 <- nlmer ( conc ~ SSfol ( Dose , Time , lKe , lKa , lCl ) ~
                 0+ lKe + lKa + lCl +(0+ lKe | Subject )+(0+ lKa | Subject ) +
                 (0+ lCl | Subject ), nAGQ =0 , Theoph ,
               start = Th.start , verbose = TRUE )
```


Reducing the model
The variance of the random effect for lK3 is essentially zero. Eliminate it.

```{r}
 nm2 <- nlmer ( conc ~ SSfol ( Dose , Time , lKe , lKa , lCl ) ~
                  0+ lKe + lKa + lCl +(0+ lKa | Subject ) +
                  (0+ lCl | Subject ), Theoph , nAGQ =0 ,
                start = Th.start , verbose = TRUE )
```

Allowing within-subject correlation of random effects

```{r}
nm3 <- nlmer ( conc ~ SSfol ( Dose , Time , lKe , lKa , lCl ) ~ 
                 0+ lKe + lKa + lCl +(0+ lKa + lCl | Subject ),
               Theoph , start = Th.start , verbose = TRUE )
```


```{r}
anova ( nm2 , nm3 )
```