Giulio Caravagna 24/5/2020
This is the material released with the paper:
- Subclonal reconstruction of tumors using machine learning and population genetics. Giulio Caravagna, Timon Heide, Marc Williams, Luis Zapata, Daniel Nichol, Ketevan Chkhaidze, William Cross, George D. Cresswell, Benjamin Werner, Ahmet Acar, Chris P. Barnes, Guido Sanguinetti, Trevor A. Graham, Andrea Sottoriva. Nature Genetics 52, 898–907 (2020).
The following R packages are required to run the analyses.
- MOBSTER, to cluster the tumour site frequency spectrum with Beta and Pareto distributions;
- BMix and VIBER, to model read counts data with Binomial mixtures;
- CNAqc, to integrate mutation and copy number data from bulk sequencing;
- TEMULATOR, to simulated non-spatial tumour growth dynamics;
- CHESS, to simulated spatial tumour growth dynamics;
The vignettes are rendered in HTML. To visualise them correctly it is best to open the HTML files locally with your browser. As an alternative, you can use a preview website.
Note that vignettes 6 and 7 use DT
tables ad cannot be rendered by
the preview website.
- MOBSTER (version
0.1.1
) installable sources are available in folderR_package
.
install.packages("./R_package/mobster_0.1.1.tar.gz, repos = NULL, type = 'sources', dependencies = TRUE)
1. Example subclonal dynamics. Simulated example of tumour subclonal evolution with snapshots of tumour dynamics at different timepoints, and MOBSTER analysis (hosted at MOBSTER website).
2. Simulated single-sample data
analysis.
n = 150
cases with 0
or 1
subclone, with simulated WGS at median
coverage 120x. Mutation calls are simulated without copy num data; the
coverage is Poisson-distributed.
3. Simulated multi-sample data
analysis.
n = 15
cases of spatially growing tumours (2D) with 0
, 1
or 2
subclones, with simulated WGS at median coverage 120x. Mutation calls
are simulated without copy num data; the coverage is
Poisson-distributed.
4. Single-sample cross-sectional lung
cases.
n = 2
lung cancer cases with 0
subclones, with WGS at median
coverage ~100x. Mutation calls are and copy num data are available from
the COALA. Code in this vignette can be
used also to re-analyse the breast and AML case samples that we discuss
in the paper (see the papers for data availability).
5. Multi-region cross-sectional colorectal carcinomas. 2 colorectal cancer cases with multiple biopsies each, with WGS at median coverage ~100x. These are new data first released with this paper. Code in this vignette can be used also to replicate the results that we discuss in the paper; a further vignette is available to replicate Supplementary Figures.
6. PCAWG
analysis.
Summary statistics for n = 2566
cases of different cancers
(pan-cancer). This cohort has WGS single-samples with coverage ~45x.
Mutation and copy number calls that we used have been generated by the
PCAWG consortium.
7. GBM
analysis.
Summary statistics for n = 71
longitudinal GBM biopsies This cohort
has WGS primary/ relapse samples with coverage ~100x. Mutation and copy
number calls that we used have been generated by the orugunal authors
Contacts: Giulio Caravagna, PhD. Institute of Cancer Research, London, UK.