PROP-RL: Pipieline for learning Robust Policies in RL

This repository contains the source code to replicate the pipeline described in "Optimizing Loop Diuretic Treatment in Hospitalized Patients: A Case Study in Practical Application of Offline Reinforcement Learning to Healthcare".

Usage

Refer to requirements.txt for the necessary conda packages.
data: This directory contains sample input data.
pipeline: This directory contains the offline RL pipeline for tabular data.
- 1_run_kmeans.py: Takes the embedded EHR features (data/embedded.p) and uses ensemble k-means clustering to discretize the data.
- 2_run_environment.py: Creates the transition and reward matrices from the training set.
- 3_run_train_policy.py: Trains all policies.
- 4_run_evaluation.py: Evaluates the learned policies on the validation and test sets.
- pipeline_example.ipynb: Pseudocode for full pipeline.

Sample Input

An example usage of the pipeline is provided in pipeline/pipeline_example.ipynb with sample input data. The input are formatted as follows:

data/embedded.p: Embedding of the raw feature values of the trajectories.
- hosp_id: Unique ID of the trajectory/hospitalization
- window_id: ID of each window
data/ens.csv: Trajectory data with state, action, reward, and next state.
- hosp_id, window_id: Same as above
- death: Encoding for final outcome
- For the last window in each trajectory, we assume that all actions (in our example, binary actions) will lead to a terminal absorbing state. The terminal state is defined by the final outcome (death). Thus the last window is recorded twice in the trajectory data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PROP-RL: Pipieline for learning Robust Policies in RL

Usage

Sample Input

Files

README.md

Latest commit

History

README.md

File metadata and controls

PROP-RL: Pipieline for learning Robust Policies in RL

Usage

Sample Input