Skip to content

Demo data and code for "Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding".

License

Notifications You must be signed in to change notification settings

anishazaveri/austen_plots

Repository files navigation

austen_plots

Installation

The code has been tested on Python 3.8.12 with the packages specified in requirements.txt

Easiest way to install is through PyPi

pip install austen-plots

Introduction

This repository contains demo data and code for
Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding
Victor Veitch and Anisha Zaveri

If a common cause affects both a treatment and outcome it can induce a spurious correlation. For example, the wealth of a patient influences both their health outcomes and whether they take an expensive drug. The presence of the common cause (wealth) induces a spurious positive association between the drug and the outcome. Austen plots are a simple visual method of determining whether some unobserved common could explain away the association between a specified treatment and outcome. The included software produces Austen plots from the outputs of standard data modeling used in causal inference pipelines.

Instructions

See colab demo here: https://github.com/vveitch/causality-tutorials/blob/main/Sensitivity_Analysis.ipynb

Or see demo notebook austen_plots_demo.ipynb

Without Bootstrapping

Use files under example_data/ as reference

  1. Fit your data using any model and generate predictions for g, the propensity score, and Q, the conditional expected outcome.
  2. Generate a .csv file with the following columns: 'g', 'Q', 't', 'y'. These correspond to the propensity score, the conditional expected outcome, the treatment and the outcome. For reference look at input_df.csv provided under example_data/.
  3. (Optional, but recommended) Repeat step 1 with key covariates dropped before model fitting. For each such instance, generate a .csv file similar to step 2. Save all such files under a single directory (called covariates in the example). If you name one of these files 'treatment.csv', the code assumes that these are predictions generated from data without treatment, and thus this is not plotted on the graph. However, the Rsqhat value for 'treatment' is provided in the output co-ordinates file.
  4. Decide a meaningful amount of bias you would like to test for, based on domain knowledge about your dataset. Let's fix this as 2 for the example dataset
  5. Run the following code (values correspond to the example dataset).
from austen_plots.AustenPlot import AustenPlot
import os

input_df_path = './example_data/input_df.csv'
bias = 2.0

# if you have no covariate controls skip specifying covariate_dir_path
# covariate_dir_path = None
covariate_dir_path = './example_data/covariates/'

ap = AustenPlot(input_df_path, covariate_dir_path)
p, plot_coords, variable_coords = ap.fit(bias=2.0)
# or if you would like to calculate an Austen plot using ATT instead
p, plot_coords, variable_coords = ap.fit(bias=2.0, do_att=True)

#save outputs
output_dir = './example_data/output/'
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

p.save(os.path.join(output_dir,
                        'austen_plot.png'), dpi=500, verbose=False)
plot_coords.to_csv(os.path.join(output_dir, 'plot_coords.csv'), index=False)
variable_coords.to_csv(os.path.join(output_dir, 'variable_coords.csv'), index=False)

With Bootstrapping

You can optionally decide to generate plots with bootstrap confidence intervals. For this, after doing the steps 1-4 under the section 'Without bootstrapping' do the following:

  1. Create a directory for bootstrapped inputs. In the example this is called bootstrap.
  2. Within bootstrap create subdirectories for each bootstrap iteration.
  3. Within each boostrapped subdirectory save .csv and, optionally, covariate files, as described in steps 1-3 using 'g', 'Q', 't', 'y' values obtained from a bootstrapped dataset. These should have the same names as those in the parent folder. Recommendation: If you are generating these values using cross validation techniques on a model, ensure that replicate rows generated by the bootstrapping procedure are within the same fold.
  4. (Optional) Decide a value for confidence interval cutoffs (Default=0.95)
  5. Run the following code (values correspond to the example dataset).
from austen_plots.AustenPlot import AustenPlot
import os

input_df_path = './example_data/input_df.csv'
bias = 2.0
ci_cutoff = 0.9
bootstrap_dir_path = './example_data/bootstrap/'

# if you have no covariate controls skip specifying covariate_dir_path
# covariate_dir_path = None
covariate_dir_path = './example_data/covariates/'

ap = AustenPlot(input_df_path, covariate_dir_path, bootstrap_dir_path)
p, plot_coords, variable_coords = ap.fit(bias=2.0, do_bootstrap=True, ci_cutoff=0.9)
# or if you would like to calculate an Austen plot using ATT instead
p, plot_coords, variable_coords = ap.fit(bias=2.0, do_bootstrap=True, , ci_cutoff=0.9, do_att=True)

# save outputs as shown above

Bug Reports

Please report bugs to Anisha Zaveri

About

Demo data and code for "Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published