An immune dysfunction score for stratification of patients with acute infection based on whole blood gene expression
This repository contains all of the codes associated with the paper by Cano-Gamez et al. describing the SepstratifieR algorithm, which is available at: https://doi.org/10.1101/2022.03.17.22272427
All codes were written in R and are provided in markdown (Rmd) format.
The repository is divided into the following sections:
This contains codes used to pre-process publicly available microarray and RNA-seq data from a variety of studies used throughout the paper, including:
a) The SHIP-TREND study b) The dutch arm of the 500FG study c) The DILGOM study d) The MARS consortium
Please refer to our paper for further details on how this data was used).
This section also contains the codes used to pre-process and explore the following data sets generated within the GAinS study:
a) A microarray cohort b) An RNA-seq cohort c) A qRT-PCR cohort
This section contains the codes used to train and evaluate all machine learning models used to derive SRS and SRSq. The following steps are detailed:
a) Definition of a robust SRS gene signature by multi-modal data integration with canonical correlation analysis (CCA) b) Integration of multimodal data from three technologies and multiple studies into a sepsis reference set using mutual nearest neighbours (mNN) c) Training and evaluation of random forest classifiers for SRS and random forest prediction models for SRSq d) Validation of SRS predictions via differential expression analysis and association tests with clinical outcomes e) Mediation analysis to assess the impact of SRS on mortality via different clinical variables
In addition, this section also describes the following analyses:
a) Characterisation of the temporal dynamics of SRSq and their association with patient clinical trajectories b) Comparison of two different approaches for patient stratification: random forest prediction and kNN-based lazy learning
This section contains the codes used to apply SepstratifieR to a number of independent, external data sets generated by other investigators. These include:
a) A cohort of sepsis patients in Australia published by Parnell et al. b) The MARS study (Scicluna et al.) c) A cohort of paediatric sepsis and septic shock patients published by Wong et al. d) A cohort of influenza patients profiled within the MOSAIC study e) A cohort of COVID-19 patients profiled within the COMBAT study
In addition, this section also contains codes used for integrating RNA-seq and mass cytometry (CyTOF) data in the COMBAT study, so as to test if SRS groups are detectable at the protein level.