Skip to content

Latest commit

 

History

History
117 lines (92 loc) · 5.83 KB

README.md

File metadata and controls

117 lines (92 loc) · 5.83 KB

MOBSTER: Source Data

Giulio Caravagna 24/5/2020



This is the material released with the paper:

  • Subclonal reconstruction of tumors using machine learning and population genetics. Giulio Caravagna, Timon Heide, Marc Williams, Luis Zapata, Daniel Nichol, Ketevan Chkhaidze, William Cross, George D. Cresswell, Benjamin Werner, Ahmet Acar, Chris P. Barnes, Guido Sanguinetti, Trevor A. Graham, Andrea Sottoriva. Nature Genetics 52, 898–907 (2020).

The following R packages are required to run the analyses.

  • MOBSTER, to cluster the tumour site frequency spectrum with Beta and Pareto distributions;
  • BMix and VIBER, to model read counts data with Binomial mixtures;
  • CNAqc, to integrate mutation and copy number data from bulk sequencing;
  • TEMULATOR, to simulated non-spatial tumour growth dynamics;
  • CHESS, to simulated spatial tumour growth dynamics;

The vignettes are rendered in HTML. To visualise them correctly it is best to open the HTML files locally with your browser. As an alternative, you can use a preview website.

Note that vignettes 6 and 7 use DT tables ad cannot be rendered by the preview website.

  • MOBSTER (version 0.1.1) installable sources are available in folder R_package.
install.packages("./R_package/mobster_0.1.1.tar.gz, repos = NULL, type = 'sources', dependencies = TRUE)

1. Example subclonal dynamics. Simulated example of tumour subclonal evolution with snapshots of tumour dynamics at different timepoints, and MOBSTER analysis (hosted at MOBSTER website).

2. Simulated single-sample data analysis. n = 150 cases with 0 or 1 subclone, with simulated WGS at median coverage 120x. Mutation calls are simulated without copy num data; the coverage is Poisson-distributed.

3. Simulated multi-sample data analysis. n = 15 cases of spatially growing tumours (2D) with 0, 1 or 2 subclones, with simulated WGS at median coverage 120x. Mutation calls are simulated without copy num data; the coverage is Poisson-distributed.

4. Single-sample cross-sectional lung cases. n = 2 lung cancer cases with 0 subclones, with WGS at median coverage ~100x. Mutation calls are and copy num data are available from the COALA. Code in this vignette can be used also to re-analyse the breast and AML case samples that we discuss in the paper (see the papers for data availability).

5. Multi-region cross-sectional colorectal carcinomas. 2 colorectal cancer cases with multiple biopsies each, with WGS at median coverage ~100x. These are new data first released with this paper. Code in this vignette can be used also to replicate the results that we discuss in the paper; a further vignette is available to replicate Supplementary Figures.

6. PCAWG analysis. Summary statistics for n = 2566 cases of different cancers (pan-cancer). This cohort has WGS single-samples with coverage ~45x. Mutation and copy number calls that we used have been generated by the PCAWG consortium.

7. GBM analysis. Summary statistics for n = 71 longitudinal GBM biopsies This cohort has WGS primary/ relapse samples with coverage ~100x. Mutation and copy number calls that we used have been generated by the orugunal authors


Contacts: Giulio Caravagna, PhD. Institute of Cancer Research, London, UK.