-
Notifications
You must be signed in to change notification settings - Fork 7
/
README.Rmd
171 lines (128 loc) · 7.47 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# MSToolkit
<!-- badges: start -->
[![Codecov test coverage](https://codecov.io/gh/MikeKSmith/MSToolkit/branch/master/graph/badge.svg)](https://app.codecov.io/gh/MikeKSmith/MSToolkit?branch=master)
[![R-CMD-check](https://github.com/MikeKSmith/MSToolkit/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/MikeKSmith/MSToolkit/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
## Overview
The MSToolkit R package has been written to help users simulate clinical trials, evaluate designs, analysis methodology and quantify operating characteristics through the application of dose and study level decision criteria.
- Can simulate parallel group, crossover and longitudinal designs.
- Flexibility in defining data generation processes using parametric models.
- Data generation model parameters can vary across simulation replicates; parameters can vary between subjects.
- The data generation function specifies the linear predictor for the mean response given inputs (dose, time, covariates); inverse link functions allow generation of continuous, binary and count data. User-written inverse link functions allow data generation from a variety of other distributions.
- Model parameters and covariates for each subject / replicate can be generated from multivariate normal distributions or sampled from external data files e.g. existing data files.
## Installation
The best way to install MSToolkit currently, is to install the development version directly from [GitHub](https://github.com/MikeKSmith/MSToolkit).
```{r installation, eval=FALSE}
#install.packages("devtools")
devtools::install_github("MikeKSmith/MSToolkit")
```
## Usage
Most users will start with the `generateData()` which generates data and controlls dropout and missing data by specifying the design, allocation to treatments, generation of parameters, and functional specification. The function has a great number of arguments which control and pass information to the low level functions.
The generateData function iteratively builds and combines the data components for each replicate, the replicate datasets would be stored as individual .CSV files ("replicate000x.csv") in the "ReplicateData" subdirectory of the working directory.
```{r usage-generateDate, eval=FALSE}
library(MSToolkit)
generateData(
replicateN = 500,
subjects = 400,
treatDoses = c(0, 5, 25, 50, 100),
conCovNames = c("wt", "age"),
conCovMean = c(83, 55) ,
conCovVCov = c(14,10)^2 ,
conCovDigits = 1,
conCovCrit = "18 <= age <= 65",
genParNames = "E0,ED50,EMAX",
genParMean = c(2,50,10),
genParVCov = diag( c(.5,30,10) ),
genParBtwNames = "E0,ED50,EMAX",
genParBtwMean = c(0,0,0),
genParBtwVCov = diag(3),
respEqn = "E0 + ((DOSE * EMAX)/(DOSE + ED50))",
respVCov = 5,
interimSubj = ".3,.7")
# This example generates 500 .CSV files with 400 subjects each to store the
# replicate datasets in the "ReplicateData" directory.
```
Once data has been generated, the next step is to analyze the replicate data using the `analyzeData()`. This function wraps together functions for performing user-specified analysis on the replicate datasets and also performing micro- and macro- level summaries of the analysis results. The summary files are named "MicroSummary.csv" and "MacroSummary.csv" respectively, and stored in the current R working directory.
Besides, two directories called "MicroEvaluation" and "MacroEvaluation" are created under the current R working directory for the output of the analysisCode ("Micro000x.csv") and macroCode ("Macro000x.csv") functions respectively.
```{r usage-analyzeData, eval=FALSE}
# Standard analysis code
emaxCode <- function(data){
library(DoseResponse)
with( data,
{
uniDoses <- sort( unique(DOSE))
eFit <- emaxalt( RESP, DOSE )
outDf <- data.frame( DOSE = uniDoses,
MEAN = eFit$dm[as.character(uniDoses)],
SE = eFit$dsd[as.character(uniDoses)] )
outDf$LOWER <- outDf$MEAN - 2 * outDf$SE
outDf$UPPER <- outDf$MEAN + 2 * outDf$SE
outDf$N <- table(DOSE)[ as.character(uniDoses) ]
outDf
})
}
# Macro evaluation code
macrocode <- function(data) {
# making up a t-test
mu0 <- data$MEAN[ data$DOSE == 0 & data$INTERIM == 0]
mu100 <- data$MEAN[ data$DOSE == 100 & data$INTERIM == 0]
n0 <- data$N[ data$DOSE == 0 & data$INTERIM == 0]
n100 <- data$N[ data$DOSE == 100 & data$INTERIM == 0]
sd0 <- data$SE[ data$DOSE == 0 & data$INTERIM == 0]
sd100 <- data$SE[ data$DOSE == 100 & data$INTERIM == 0]
sddiff <- if( n0 == n100 ){
sqrt( (sd0^2 + sd100^2) / (n0 + n100) )
} else {
sqrt( (1/n0 + 1/n100) * ( (n0-1)*sd0^2 + (n100-1)*sd100^2 ) / (n0+n100-2) )
}
tstat <- ( mu100 - mu0 ) / sddiff
success <- abs(tstat) > qt( .975, n0+n100-2)
data.frame( SUCCESS = success, TSTAT = tstat )
}
# Interim analysis code
interimCode <- function( data ){
dropdose <- with( data, DOSE [ sign(UPPER) != sign(LOWER) & DOSE != 0] )
outList <- list()
if( length(dropdose) > 0 ) outList$DROP <- dropdose
outList$STOP <- length(dropdose) == nrow(data)-1
outList
}
# Run analysis
analyzeData( 1:5, analysisCode = emaxCode, macroCode = macrocode,
interimCode = interimCode )
```
Furthermore, The analyzeData function can use the _parallel_ package to split analysis across CPUs across cores on the compute resource it is running on. Specifying `grid = TRUE` splits the job across either (number of cores - 1) or `getOption("max.clusters")` whichever is the minimum. It is set to `FALSE` by default.
## Store the simulation results
By default, MSToolkit writes the Replicate, Micro- and Maco-evaluation datasets out to CSV format files, one for each simulation replicate. These are written to respective folders. The benefit of using CSV is the ability to easily view the simulated data or analysis results for each replicate, as well as the ability to share these with others. However, clearly this means that there will be three times as many CSV files as simulation replicates, which may be undesirable.
There is also the option to write the replicate data (simulated data) as .RData files or to keep the replicate data in memory. Micro- and macro-evaluation datasets are always written as CSV files.
```{r store-simulation-data, eval=FALSE}
# retrieve MSToolkit Data Method
getEctdDataMethod(method = 'CSV')
# change the MSToolkit Data Method
setEctdDataMethod(method = 'CSV')
# or setEctdDataMethod(method = 'RData')
# or setEctdDataMethod(method = 'Internal')
```
## SAS
Before using SAS as an analytical engine or using an external compute resource environment, it is important to set up preferences for the location of R and the SAS executable (if required). A file ECTD.ini is provided in the top directory of the library which should contain the following rows:
```{r sas, eval=FALSE}
# Instructions to set up environmental variables
# This option allows the setting of the R execution binary on a Linux grid
Sys.setenv("RLSF_LINUX"=".")
# This option allows the setting of the SAS execution path on a Unix machine
Sys.setenv("SASPATH_UNIX"=".")
# This option allows the setting of the SAS execution path on a Windows machine
Sys.setenv("SASPATH_WIN"="C:\\Program Files\\SAS Institute\\SAS\\V8\\sas.exe")
```