Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

89 add vignette #90

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

89 add vignette #90

wants to merge 13 commits into from

Conversation

MikeJSeo
Copy link
Collaborator

No description provided.

@MikeJSeo MikeJSeo added the documentation Improvements or additions to documentation label Mar 15, 2024
@MikeJSeo MikeJSeo added this to the v0.1 milestone Mar 15, 2024
@MikeJSeo MikeJSeo requested a review from gravesti March 15, 2024 08:04
@MikeJSeo MikeJSeo linked an issue Mar 15, 2024 that may be closed by this pull request
Copy link
Contributor

github-actions bot commented Mar 15, 2024

badge

Code Coverage Summary

Filename               Stmts    Miss  Cover    Missing
-------------------  -------  ------  -------  ----------------------------------------------------
R/bootstrap.R             23      23  0.00%    54-88
R/bucher.R                50       4  92.00%   49, 97, 101, 105
R/data_generation.R       20      20  0.00%    36-73
R/maic_anchored.R        106     106  0.00%    52-202
R/maic_unanchored.R       94      94  0.00%    51-190
R/matching.R             238     145  39.08%   59, 64-66, 71-74, 81-88, 109, 135, 166, 199, 241-501
R/plot_km.R              439     439  0.00%    46-736
R/plot_km2.R             107     107  0.00%    50-209
R/process_data.R         101      82  18.81%   39-135, 183, 216-280
R/reporting.R             19      19  0.00%    14-37
R/survival-helper.R       30      30  0.00%    26-82
R/time-helper.R            9       9  0.00%    22-54
R/zzz.R                    1       1  0.00%    2
TOTAL                   1237    1079  12.77%

Diff against main

Filename               Stmts    Miss  Cover
-------------------  -------  ------  --------
R/maic_anchored.R        -34     +74  -77.14%
R/maic_unanchored.R      -32     -32  +100.00%
R/matching.R               0      +3  -1.26%
R/process_data.R           0     +38  -37.62%
R/reporting.R              0     +19  -100.00%
R/survival-helper.R        0     +27  -90.00%
R/time-helper.R            0      +5  -55.56%
TOTAL                    -66    +134  -14.70%

Results for commit: eaa3270

Minimum allowed coverage is 80%

♻️ This comment has been updated with latest results

Copy link
Contributor

github-actions bot commented Mar 15, 2024

Unit Tests Summary

 1 files   2 suites   0s ⏱️
 8 tests  8 ✅ 0 💤 0 ❌
21 runs  21 ✅ 0 💤 0 ❌

Results for commit eaa3270.

♻️ This comment has been updated with latest results.

bibliography: references.bib
csl: biomedicine.csl
vignette: >
%\VignetteIndexEntry{Using maicplus}
Copy link
Contributor

@gravesti gravesti Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%\VignetteIndexEntry{Using maicplus}
%\VignetteIndexEntry{Calculating Weights}

Copy link
Contributor

@gravesti gravesti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing all these @MikeJSeo
I just noted a few tiny things so far

@@ -0,0 +1,200 @@
---
title: "Calculating weights"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title: "Calculating weights"
title: "Calculating Weights"

bibliography: references.bib
csl: biomedicine.csl
vignette: >
%\VignetteIndexEntry{Using maicplus}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%\VignetteIndexEntry{Using maicplus}
%\VignetteIndexEntry{Introduction}

@@ -0,0 +1,321 @@
---
title: "Kaplan Meier Plots"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title: "Kaplan Meier Plots"
title: "Kaplan-Meier Plots"

bibliography: references.bib
csl: biomedicine.csl
vignette: >
%\VignetteIndexEntry{Using maicplus}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%\VignetteIndexEntry{Using maicplus}
%\VignetteIndexEntry{Kaplan-Meier Plots}

Copy link
Collaborator

@ckalyvas ckalyvas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have provided some suggestive edits and comments for your consideration. If you have any questions let's discuss in our tomorrow's meeting. I want to look into the text for describing the MAIC methodology a bit more carefully. Hopefully during the weekend. :)


In this example scenario, age, sex, Eastern Cooperative Oncology Group (ECOG) performance status, smoking status, and number of previous treatments have been identified as imbalanced prognostic variables/effect modifiers.

This example reads in and combines data from three standard simulated data sets (adsl, adrs and adtte) which are saved as '.csv' files.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph should be before the previous one.


## Preprocessing aggregate data

There are two ways of specifying aggregate data. One approach is to read in aggregate data using an excel spreadsheet. In the spreadsheet, possible variable types include mean, median, or standard deviation for continuous variables and count or proportion for binary variables. The naming should be followed by these suffixes accordingly: _COUNT, _MEAN, _MEDIAN, _SD, _PROP. Then, `process_agd` will convert the count into proportions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The passage could benefit from a few tweaks for clarity and coherence. Please see suggested text below:
"There are two methods for specifying aggregate data. The first method involves importing aggregate data via an Excel spreadsheet. Within the spreadsheet, variable types such as mean, median, or standard deviation are possible for continuous variables, while count or proportion are applicable for binary variables. Each variable should be suffixed accordingly: _COUNT, _MEAN, _MEDIAN, _SD, or _PROP. Subsequently, the process_agd function will convert counts into proportions.

The second method entails defining a data frame of aggregate data in R. When using this approach, the _COUNT prefix should be omitted, and only proportions are permissible for binary variables. Other suffixes remain consistent with the first method.

Any potential missing values in binary variables should be addressed by adjusting the denominator to account for missing counts, i.e., the proportion equals the count divided by (N - missing).


## Centering IPD

In the statistical theory section, we briefly explained why it is useful to transform IPD by subtracting the aggregate data means when performing optimization. The function `center_ipd` centers the IPD using the aggregate data means.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to the covariates?
If not, may I suggest that the package center the covariates by default (this improves sampling efficiency)? Hence, the treatment coefficients correspond to the covariates at their centered values (printed in the output). The user should be able to convert one to the other by adding/subtracting the interaction coefficients multiplied by the centering values.

head(ipd_centered)
```

### How to handle standard deviation aggregate summary
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested text to improve clarity and coherence:

Handling Standard Deviation in Aggregate Summary

As outlined in NICE DSU TSD 18 Appendix D (Phillippo et al., 2016), balancing on both mean and standard deviation for continuous variables may be necessary in certain scenarios. When a standard deviation is provided in the comparator population, preprocessing involves calculating $E(X^2)$ in the target population using the variance formula $Var(X)=E(X^{2})-E(X)^{2}$. This calculated $E(X^2)$ in the target population is then aligned with $X^{2}$ computed in the internal IPD.


As described in NICE DSU TSD 18 Appendix D [@phillippo2016b], balancing on both mean and standard deviation for continuous variables may be considered in some cases. If a standard deviation is provided in the comparator population, preprocessing is done so that in the target population, $E(X^2)$ is calculated using the variance formula $Var(X)=E(X^{2})-E(X)^{2}$. This $E(X^2)$ in the target population is then matched with the $X^{2}$ calculated in the internal IPD.

### How to handle median aggregate summary
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling Median in Aggregate Summary

When a median is provided, the IPD is preprocessed to categorize the variable into a binary form. Values in the IPD exceeding the comparator population median are assigned a value of 1, while values lower than the median are assigned 0. The comparator population median is replaced by 0.5 to adjust to the binary categorization in the IPD data. Subsequently, the newly formed IPD binary variable is aligned to ensure a proportion of 0.5.


# Calculating weights

We use the centered IPD and use the function `estimate_weights` to calculate the weights. Before running this function, we need to specify the column that are centered, i.e. covariates that will be used in the optimization.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We utilize the centered IPD and employ the estimate_weights function to compute the weights. Prior to executing this function, it's essential to specify the centered columns, i.e., the covariates to be utilized in the optimization process.


Following the calculation of weights, it is necessary to determine whether the optimization procedure has worked correctly and whether the weights derived are sensible.

The approximate effective sample size is calculated as: $$ ESS = \frac{({ \sum_{i=1}^n\hat{\omega}_i })^2}{ \sum_{i=1}^n \hat{\omega}^2_i} $$ A small ESS, relative to the original sample size, is an indication that the weights are highly variable and that the estimate may be unstable. This often occurs if there is very limited overlap in the distribution of the matching variables between the populations being compared.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this how ESS is calculated in the Signorovich article? If so, we should mention that the ESS, as defined in Signorovitch et al., is derived from the estimates using linear combinations of the observations. Effective sample size cannot be easily calculated when utilizing weighted survival estimates because survival estimates are not a linear function of the observations.


The approximate effective sample size is calculated as: $$ ESS = \frac{({ \sum_{i=1}^n\hat{\omega}_i })^2}{ \sum_{i=1}^n \hat{\omega}^2_i} $$ A small ESS, relative to the original sample size, is an indication that the weights are highly variable and that the estimate may be unstable. This often occurs if there is very limited overlap in the distribution of the matching variables between the populations being compared.

In this example, the ESS reduction is 66.73% of the total number of patients in the intervention arm (500 patients in total). As this is a considerable reduction, estimates using this weighted data may be unreliable.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put a sentence explaining what this 66.73% means in simple words, i.e., ESS reflects the fraction of the original sample contributing to the adjusted outcome, and large reductions in ESS may indicate poor overlap between the IPD and AgD studies? (Also potential advancement for our package https://doi.org/10.1002/jrsm.1466)

```


# Introduction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested text to improve clarity and flow:

Health technology assessments and appraisals necessitate dependable estimations of relative treatment effects to guide reimbursement determinations. In instances where direct comparative evidence is lacking, yet both treatments under scrutiny have been separately evaluated against a shared comparator (e.g., placebo or standard care), a conventional indirect comparison can be conducted utilizing published aggregate data from each study.

This document outlines the procedures for conducting a matching-adjusted indirect comparison (MAIC) analysis using the maicplus package in R. MAIC is suitable when individual patient data from one trial and aggregate data from another are accessible. The analysis focuses on endpoints such as time-to-event (e.g., overall survival) or binary outcomes (e.g., objective tumor response).

The methodologies detailed herein are based on the original work by Signorovitch et al. (2010) and further elucidated in the National Institute for Health and Care Excellence (NICE) Decision Support Unit (DSU) Technical Support Document (TSD) 18 (Signorovitch et al., 2010; Phillippo et al., 2016a).

A clinical trial lacking a common comparator treatment to link it with other trials is termed an unanchored MAIC. Without a common comparator, it becomes challenging to directly compare the outcomes of interest between different treatments or interventions. Conversely, if a common comparator is available, it is termed an anchored MAIC. Anchored MAIC offers certain advantages over unanchored MAIC, as it can provide more reliable and interpretable results by reducing the uncertainty associated with indirect comparisons.

MAIC methods aim to adjust for between-study differences in patient demographics or disease characteristics at baseline. In scenarios where a common treatment comparator is absent, MAIC assumes that observed differences in absolute outcomes between trials are solely attributable to imbalances in prognostic variables and effect modifiers. This assumption requires that all imbalanced prognostic variables and effect modifiers between the studies are known, which is often challenging to fulfill (Phillippo et al., 2016a).

Various approaches exist for identifying prognostic variables and effect modifiers for use in MAIC analyses. These include clinical consultation with experts, review of published literature, examination of previous regulatory submissions, and data-driven methods such as regression modeling and subgroup analysis to uncover interactions between baseline characteristics and treatment effects.

```


# Introduction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using the word "participant" instead of the words "patient" and "subject".

Copy link
Collaborator

@ckalyvas ckalyvas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed a few changes in the methodology part. Feel free to accept or disregard as you see fit.


# Introduction

To plot Kaplan Meier curves, we need to first obtain pseudo comparator IPD by digitizing Kaplan-Meier curves from the comparator study. For more information on how to digitize Kaplan-Meier curves, refer to Guyot et al. and Liu et al. [@guyot2012; @Liu2021].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested text to connect it with the MAIC application:
"After conducting an MAIC, the results can be effectively illustrated through visual representations of the weighted and non-weighted data. This can be achieved by plotting Kaplan-Meier (KM) curves and comprehensively depicting the time-to-event data. To generate these curves, it is crucial to obtain pseudo-IPD from the comparator study through the digitization of KM curves from the comparator study. For guidance on this process, refer to the works of Guyot et al. and Liu et al. [@guyot2012; @liu2021]."


We will briefly go over the statistical theory behind MAIC. For more detailed information, refer to Signorovitch et al. 2010. [@signorovitch2010]

Let us define $t_i$ to be the treatment patient $i$ received. We assume $t_i=0$ if the patient received intervention (IPD) and $t_i=1$ if the patient received comparator treatment. The causal effect of treatment $T=0$ vs $T=1$ on the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matching is accomplished by re-weighting patients in the study with the IPD by their odds, or likelihood, of having been enrolled in the study with the AgD. We usually refer to probability with the term “likelihood” as we seek for the variable that would maximize the probability of observing the outcome. The approach is very similar to propensity score weighting with the difference that IPD is not available for one study, so the usual maximum likelihood approach cannot be used to estimate the parameters of the propensity score model. Instead, a method of moments must be used (the method of moments is a statistical method for the estimation of population parameters). After the matching is complete and weights have been added to the IPD, it is possible to estimate the weighted outcomes and compare the results across.

The mapping approach can be described as follows: assuming that each trial has one arm, each patient can be characterized by the following random triple (X, T, Y), where X represents the baseline characteristics (e.g., age, and weight), T represents the treatment of interest (e.g., T = 0 for the IPD study and T = 1 for the study with AgD), and Y is the outcome of interest (e.g., OS).

Each patient is characterized by a random triple (x_i, t_i, y_i) with i=1 to N but only when IPD is available, i.e., when t_i = 0. In case where t_i = 1, only the mean baseline characteristics \bar{x}{(t_i=1)}=\sum{(t_i=1)}x_i and mean outcome \bar{y}{(t_i=1)}=\sum{(t_i=1)}y_i are observed.

Given the observed data, the causal effect of treatment T = 0 versus T = 1 on the mean of Y can be estimated as follows:

0=\frac{\sum_{i=1}^{n}x_{i}exp(x_i^{T}\hat{\beta})}{\sum_{i=1}^{n}exp(x_i^{T}\hat{\beta})}-\bar{x}_{agg}
\]

If the $x_i$ contains all confounders and the logistic regression for $w_i$ is correctly specified, we obtain a consistent estimate of the causal effect of intervention vs comparator treatment. Above equation is equivalent to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to use this estimator since a logistic regression model for the odds of receiving T = 1 vs T = 0 would, by definition, provide the correct weights for balancing the trial populations. If the $x_i$ contains all the confounders and the logistic model for $w_i$ is correctly specified, then $\hat{\theta}$ in the next equation provides a consistent estimate of the causal effect of treatment T = 0 vs T = 1 on the mean of Y among patients actually receiving treatment T = 1.

[
\hat{\theta}=\frac{\sum_{i=1}^{n}y_{i}exp(x_i^{T}\hat{\beta})}{\sum_{i=1}^{n}exp(x_i^{T}\hat{\beta})}-\bar{y}_{agg}
]

If the $x_i$ contains all confounders and the logistic regression for $w_i$ is correctly specified, we obtain a consistent estimate of the causal effect of intervention vs comparator treatment. Above equation is equivalent to

\[
0=\sum_{i=1}^{n}(x_{i}-\bar{x}_{agg})exp(x_{i}^{T}\hat{\beta})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this formula refers to the transformation of the IPD by subtracting the aggregate data means, I suggest placing it after the text below.

Copy link
Contributor

github-actions bot commented Apr 12, 2024

Unit Test Performance Difference

Test Suite $Status$ Time on main $±Time$ $±Tests$ $±Skipped$ $±Failures$ $±Errors$
maic_anchored 💀 $1.18$ $-1.18$ $-10$ $0$ $0$ $0$
Additional test case details
Test Suite $Status$ Time on main $±Time$ Test Case
maic_anchored 💀 $0.31$ $-0.31$ maic_unanchored_works_for_TTE_using_bootstrap_SE
maic_anchored 💀 $0.87$ $-0.87$ maic_unanchored_works_for_TTE_using_robust_SE

Results for commit 44469a4

♻️ This comment has been updated with latest results.

@MikeJSeo MikeJSeo removed a link to an issue Jul 12, 2024
@MikeJSeo MikeJSeo linked an issue Jul 12, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: 🏗 In progress
Development

Successfully merging this pull request may close these issues.

add vignette
3 participants