SIFA for tumor subclone inference

SIFA (tumor Subclone Identification by Feature Allocation) is a Baysian method to identify tumor subclones using WGS data. This page will guide you through the basic steps of using SIFA.

Currently SIFA requires sample size to be at least two, since a unique tree cannot be identified with only one sample.

Reference

Zeng, L., Warren, J.L. and Zhao, H. (2017) Phylogeny-based tumor subclone identification using Bayesian feature allocation model [pdf]

Software dependencies

SIFA is written in R and C++. Please install the following packages in R prior to implementing our software:

data manipulation: tidyr,reshape,dplyr
segment calling: copynumber
Bayesian analysis: coda
integrating c++ functionality: Rcpp,RcppArmadillo
visualization: ggplot2, igraph
others: gtools

Prepare software input

SIFA takes an .Rdata file as input.

The .Rdata file should contain a list obs_data with required fields:

obs_data$D for total reads matrix
obs_data$X for mutant reads matrix
obs_data$loc for mutation location matrix
obs_data$segments for loci segmentation matrix.

When obs_data$segments is not provided, we will use the information provided in obs_data$loc and obs_data$D to call the genome segments.

Input format:

obs_data$D and obs_data$X should take the following format:

loci sample 1 sample 2 sample 3 ...

locus 1 12 11 33 ...

locus 2 5 8 7 ...

... ... ... ... ...

locus J 22 10 17 ...
obs_data$loc should take the following format:

chromosome position gene

1 7660469 CAMTA1

3 88482840

13 102703724 FGF14

... ... ...

23 153383479

For loci in non-coding regions, the gene column can be left blank.

obs_data$segments should take the following format:

segments start end

segment 1 1 5

segment 2 6 25

segment 3 26 40

... ... ...

segment S 155 J

Each row of the matrix represents one segment, with the two entries marking the starting and ending locus of the segment.

Using SIFA

To use SIFA, please set R working directory to SIFA_package after cloning this repository. Make sure you have all the dependencies correctly installed, load your input .Rdata file, and then open source code SIFA_app.R execute the commands line by line following the instructions below.

In the MODEL INPUT section of the code, load the .Rdata where your inputs are saved, specify random seed myseed, and specify the folder foldername to store output files (a new folder will be created if it does not exist). For example:

 #############################################
 ########## MODEL INPUT ######################
 #############################################
 load("example.Rdata")
 myseed = 1                # set random seed
 foldername = "temp_out"   # set output foldername
 dir.create(foldername)  # folder where outputs are saved

Next, you need to specify Bayesian sampling parameters in specify_pars.R. For most of the parameters, default values work just fine. Some of the parameters you can change are:

 #### maximum number of copy
 Params$max_CN=4
 #### maximum number of mutant copies
 Params$max_mut=2
 
 #### MCMC sampling parameters 
 MCMC_par$burnin=4000  # burnin sample size
 MCMC_par$Nsamp=4000   # number of samples for inference
 MCMC_par$Ntune=2000  # number of samples used for 	adaptive parameter tuning
 Nclone=c(3:7) # candidate subclone numbers K

run the remaining sections one by one:
- sampler.R to perform sampling
- Model_select.R to perform model selection. Plot of model selection will be saved in selection.pdf
- Fit_visual(foldername,X,D) for results visualization:
  - Visualization results will list top 3 frequent trees (when >= 3 tree structures exist) in posterior samples, and display corresponding parameter estimations.
- get_point_estimate():
  - get parameter point estimates from a given posterior sample .Rdata file
  - will identify up to top 3 trees from posterior samples, and calculate point estimates for each tree

During the sampling process, samples for each individual K will be stored in one .Rdata file.

Results Examples

Estimated phylogenetic tree:

Estimated subclone mutated copy numbers:

Estimated subclone total copy numbers:

Estimated subclone fractions across samples:

Contact

Please feel free to contact li.zeng@yale.edu if you have any question.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Supplementary		Supplementary
figures		figures
util		util
README.md		README.md
SIFA_app.R		SIFA_app.R
example.Rdata		example.Rdata
specify_pars.R		specify_pars.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIFA for tumor subclone inference

Reference

Software dependencies

Prepare software input

Using SIFA

Results Examples

Contact

About

Releases

Packages

Languages

loci	sample 1	sample 2	sample 3	...
locus 1	12	11	33	...
locus 2	5	8	7	...
...	...	...	...	...
locus J	22	10	17	...

chromosome	position	gene
1	7660469	CAMTA1
3	88482840
13	102703724	FGF14
...	...	...
23	153383479

segments	start	end
segment 1	1	5
segment 2	6	25
segment 3	26	40
...	...	...
segment S	155	J

zengliX/SIFA

Folders and files

Latest commit

History

Repository files navigation

SIFA for tumor subclone inference

Reference

Software dependencies

Prepare software input

Using SIFA

Results Examples

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages