-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
146 lines (104 loc) · 6.06 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
output: github_document
---
[![Build Status](https://travis-ci.org/jaredhuling/personalized2part.svg?branch=master)](https://travis-ci.org/jaredhuling/personalized2part)
[![version](http://www.r-pkg.org/badges/version/personalized2part)](https://cran.r-project.org/package=personalized2part)
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# personalized2part
The `personalized2part` package implements the methodology of Huling, Smith, and Chen (2020), which allows for subgroup identification for semi-continuous outcomes by estimating individualized treatment rules. It uses a two part modeling (or hurdle modeling) framework to handle semi-continuous data by modeling separately the positive part of the outcome and an indicator of whether each outcome is positive, but still results in a single treatment rule. High dimensional data is handled with a cooperative lasso penalty, which encourages the coefficients in the two models to have the same sign.
## Installation
You can install the development version from [GitHub](https://github.com/jaredhuling/personalized2part) with:
``` r
# install.packages("devtools")
devtools::install_github("jaredhuling/personalized2part")
```
## Example
This is a basic example which shows you how to solve a common problem:
```{r example, message=FALSE,warning=FALSE}
library(personalized2part)
```
Simulate semicontinuous data with a heterogeneous treatment effect:
```{r sim}
set.seed(1)
dat <- sim_semicontinuous_data(n.obs = 300, n.vars = 50)
x <- dat$x
y <- dat$y
trt <- dat$trt
```
Use the built-in function `create.propensity.function` from the [personalized](https://cran.r-project.org/package=personalized) package to construct a function that fits a propensity score model using 10-fold cross fitting:
```{r propens}
propens_func <- create.propensity.function(crossfit = TRUE,
nfolds.crossfit = 10,
cv.glmnet.args = list(type.measure = "auc"))
```
Use the built-in function `create.augmentation.function` from the [personalized](https://cran.r-project.org/package=personalized) package to construct outcome augmentation functions for the zero part model using 10-fold cross fitting:
```{r augbin}
aug_func_binary <- create.augmentation.function(family = "binomial",
crossfit = TRUE,
nfolds.crossfit = 10,
cv.glmnet.args = list(type.measure = "auc"))
```
Use the built-in function `HDtweedie_kfold_aug` from the personalized2part package to construct outcome augmentation functions for the positive part model using 10-fold cross fitting using a penalized gamma regression model:
```{r aug_gamma}
aug_func_positive <- function(x, y, trt)
{
HDtweedie_kfold_aug(x, y, trt, K = 10, p = 2,
interactions = TRUE)
}
```
```{r fit_subgroup, warning=FALSE}
fitted_2part_subgrp_model <- fit_subgroup_2part(x, y, trt,
propensity.func = propens_func,
propensity.func.positive = propens_func,
augment.func.zero = aug_func_binary,
augment.func.positive = aug_func_positive)
## the model print display takes the same form as fitted models from
## the personalized package
fitted_2part_subgrp_model
```
We can plot the coefficient curves for the two models as the following:
```{r, fig.height=4}
par(mfrow = c(1,2))
plot(fitted_2part_subgrp_model$model$hd2part.fit, "zero")
plot(fitted_2part_subgrp_model$model$hd2part.fit, "positive")
```
Now evaluate value function on test set based on the estimated individualized treatment rule and compare with average outcome (a value function larger than the average outcome means that the ITR results in better outcomes than standard practice).
The predict function returns predicted values for the heterogeneous treatment effect in terms of the risk ratio $E[Y|X,T=1] / E[Y|X,T=-1]$, so greater than 1 means the treatment is beneficial (if larger outcomes are preferred).
```{r test}
dat.test <- sim_semicontinuous_data(n.obs = 10000, n.vars = 50)
x.test <- dat.test$x
y.test <- dat.test$y
predicted_hte <- predict(fitted_2part_subgrp_model, x.test)
## estmated test set value function:
personalized2part:::computeValue(y.test, predicted_hte, dat.test$trt,
pi.x = dat.test$pi.x, cutoff = 1)
## average outcome in the test set:
mean(dat.test$y)
# We can see that the estimated treatment rule results in better outcomes for the test set
```
Now let's compare with simply using a squared error loss under the framework of Chen, et al (2017) to estimate an ITR:
```{r compare}
fsm_log <- fit.subgroup(x, log(y+0.1), trt, propensity.func = propens_func,
augment.func = create.augmentation.function(family="gaussian"),
loss = "sq_loss_lasso")
fsm <- fit.subgroup(x, y, trt, propensity.func = propens_func,
augment.func = create.augmentation.function(family="gaussian"),
loss = "sq_loss_lasso")
pred_hte_sqloss_log <- predict(fsm_log, x.test)
pred_hte_sqloss <- predict(fsm, x.test)
## the value function is smaller than for the 2 part model
personalized2part:::computeValue(y.test, pred_hte_sqloss, dat.test$trt,
pi.x = dat.test$pi.x, cutoff = 0)
personalized2part:::computeValue(y.test, pred_hte_sqloss_log, dat.test$trt,
pi.x = dat.test$pi.x, cutoff = 0)
```
## Reference
[1] Huling J, Smith M, Chen G (2020). A two-part framework for estimating individualized treatment rules from semi-continuous outcomes. Journal of the American Statistical Association, in press.
[2] Chen S, Tian L, Cai T, Yu M (2017). A general statistical framework for subgroup identification and comparative treatment scoring. Biometrics, 73(4):1199-209.