tascCODA_reproducibility

Reproductibility repository for "tascCODA: Bayesian tree-aggregated analysis of compositional amplicon and single-cell data" (Ostner et al., 2021)).

This repository contains the data generation and analysis scripts used to produce the results in the manuscript, as well as all figures generated in the process.

Setup

To reproduce the results, first install the packages listed in the requirements.txt file.

Then, download the raw and intermediate data from zenodo and unpack it in the same directory as this repository:

<parent_directory>
    |
    |_ tascCODA_reproducibility
    |
    |_ tascCODA_data

Instructions

The repository is separated into three parts.

1. Synthetic benchmarks

The benchmarks directory contains all simulations and analyses related to the synthetic data benchmarks. Functions used for data generation and execution in all three benchmarks are contained in the scripts directory.

For all three simulation studies, the respective subdirectory contains:

One jupyter notebook for data generation
One python script for running the benchmark
One jupyter notebook for analysis of the produced results.

The plots generated for each benchmark are stored in the subdirectory plots. The model comparison benchmark additionally contains a jupyter notebook that evaluates the generated data with all models except scCODA and tascCODA.

NOTE: The python scripts to run the benchmarks were executed on the HPC cluster of Helmholtz Zentrum München and will not work elsewhere! Running the benchmarks is also very resource intensive.

2. Real data applications

The applications directory contains the scripts and plots used for the real data applications on single-cell (smillie_UC) and microbial (labus_IBS) sequencing data.

scRNA-seq data

For the analysis of Ulcerative Colitis, three jupyter notebooks are available:

1_preprocessing_Smillie performs all preprocessing steps to prepare the data for analysis
2_Smillie_treeAgg_run applies tascCODA, scCDOA and Dirichlet regression to the different subsets of the data and generates all relating plots
3_Smillie_oos_prediction performs the out-of-sample prediction analysis

The plots generated are stored in the subdirectory plots_smillie.

16S sequencing data

The analysis of Irritable Bowel Syndrome is entirely performed in the jupyter notebook tasccoda_analysis_Labus. ibs_util_functions contains some functions used by this notebook.

The plots generated are stored in the subdirectory IBS_plots.

3. Other material

This directory contains scripts, plots and other material for producing figures 1 and S1 from the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
applications		applications
benchmarks		benchmarks
other		other
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tascCODA_reproducibility

Setup

Instructions

1. Synthetic benchmarks

2. Real data applications

3. Other material

About

Releases

Packages

Languages

License

bio-datascience/tascCODA_reproducibility

Folders and files

Latest commit

History

Repository files navigation

tascCODA_reproducibility

Setup

Instructions

1. Synthetic benchmarks

2. Real data applications

3. Other material

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages