README.Rmd

---
output: github_document
---

[![Build Status](https://travis-ci.org/jaredhuling/personalized2part.svg?branch=master)](https://travis-ci.org/jaredhuling/personalized2part)
[![version](http://www.r-pkg.org/badges/version/personalized2part)](https://cran.r-project.org/package=personalized2part)

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# personalized2part

The `personalized2part` package implements the methodology of Huling, Smith, and Chen (2020), which allows for subgroup identification for semi-continuous outcomes by estimating individualized treatment rules. It uses a two part modeling (or hurdle modeling) framework to handle semi-continuous data by modeling separately the positive part of the outcome and an indicator of whether each outcome is positive, but still results in a single treatment rule. High dimensional data is handled with a cooperative lasso penalty, which encourages the coefficients in the two models to have the same sign. 

## Installation

You can install the  development version from [GitHub](https://github.com/jaredhuling/personalized2part) with:

``` r
# install.packages("devtools")
devtools::install_github("jaredhuling/personalized2part")
```
## Example

This is a basic example which shows you how to solve a common problem:

```{r example, message=FALSE,warning=FALSE}
library(personalized2part)
```

Simulate semicontinuous data with a heterogeneous treatment effect:
```{r sim}
set.seed(1)
dat <- sim_semicontinuous_data(n.obs = 300, n.vars = 50)

x   <- dat$x
y   <- dat$y
trt <- dat$trt
```

Use the built-in function `create.propensity.function` from the [personalized](https://cran.r-project.org/package=personalized) package to construct a function that fits a propensity score model using 10-fold cross fitting:

```{r propens}
propens_func <- create.propensity.function(crossfit = TRUE, 
                                           nfolds.crossfit = 10, 
                                           cv.glmnet.args = list(type.measure = "auc"))
```

Use the built-in function `create.augmentation.function` from the [personalized](https://cran.r-project.org/package=personalized) package to construct outcome augmentation functions for the zero part model using 10-fold cross fitting:

```{r augbin}
aug_func_binary <- create.augmentation.function(family = "binomial",
                                                crossfit = TRUE, 
                                                nfolds.crossfit = 10, 
                                                cv.glmnet.args = list(type.measure = "auc"))
```

Use the built-in function `HDtweedie_kfold_aug` from the personalized2part package to construct outcome augmentation functions for the positive part model using 10-fold cross fitting using a penalized gamma regression model:
```{r aug_gamma}

aug_func_positive <- function(x, y, trt)
{
        HDtweedie_kfold_aug(x, y, trt, K = 10, p = 2,
                            interactions = TRUE)
}

```


```{r fit_subgroup, warning=FALSE}
fitted_2part_subgrp_model <- fit_subgroup_2part(x, y, trt, 
                                                propensity.func = propens_func,
                                                propensity.func.positive = propens_func, 
                                                augment.func.zero = aug_func_binary,
                                                augment.func.positive = aug_func_positive)

## the model print display takes the same form as fitted models from
## the personalized package
fitted_2part_subgrp_model
```

We can plot the coefficient curves for the two models as the following:

```{r, fig.height=4}
par(mfrow = c(1,2))
plot(fitted_2part_subgrp_model$model$hd2part.fit, "zero")
plot(fitted_2part_subgrp_model$model$hd2part.fit, "positive")
```

Now evaluate value function on test set based on the estimated individualized treatment rule and compare with average outcome (a value function larger than the average outcome means that the ITR results in better outcomes than standard practice).
The predict function returns predicted values for the heterogeneous treatment effect in terms of the risk ratio $E[Y|X,T=1] / E[Y|X,T=-1]$, so greater than 1 means the treatment is beneficial (if larger outcomes are preferred).
```{r test}

dat.test <- sim_semicontinuous_data(n.obs = 10000, n.vars = 50)

x.test   <- dat.test$x
y.test   <- dat.test$y

predicted_hte <- predict(fitted_2part_subgrp_model, x.test)

## estmated test set value function:
personalized2part:::computeValue(y.test, predicted_hte, dat.test$trt,
                                 pi.x = dat.test$pi.x, cutoff = 1)

## average outcome in the test set:
mean(dat.test$y)

# We can see that the estimated treatment rule results in better outcomes for the test set
```

Now let's compare with simply using a squared error loss under the framework of Chen, et al (2017) to estimate an ITR:
```{r compare}


fsm_log <- fit.subgroup(x, log(y+0.1), trt, propensity.func = propens_func, 
                    augment.func = create.augmentation.function(family="gaussian"),
                    loss = "sq_loss_lasso")

fsm <- fit.subgroup(x, y, trt, propensity.func = propens_func, 
                    augment.func = create.augmentation.function(family="gaussian"),
                    loss = "sq_loss_lasso")

pred_hte_sqloss_log <- predict(fsm_log, x.test)
pred_hte_sqloss <- predict(fsm, x.test)

## the value function is smaller than for the 2 part model
personalized2part:::computeValue(y.test, pred_hte_sqloss, dat.test$trt,
                                 pi.x = dat.test$pi.x, cutoff = 0)

personalized2part:::computeValue(y.test, pred_hte_sqloss_log, dat.test$trt,
                                 pi.x = dat.test$pi.x, cutoff = 0)

```

## Reference
[1] Huling J, Smith M, Chen G (2020). A two-part framework for estimating individualized treatment rules from semi-continuous outcomes. Journal of the American Statistical Association, in press.

[2] Chen S, Tian L, Cai T, Yu M (2017). A general statistical framework for subgroup identification and comparative treatment scoring. Biometrics, 73(4):1199-209.