Associated paper: Feng, et al. (In press). "Estimation of cell lineage trees by maximum-likelihood phylogenetics". Annals of Applied Statistics. Preprint available on BioRxiv.
You will need to have PHYLIP mix installed so that you can call mix
on the command line.
http://evolution.genetics.washington.edu/phylip/
We use pip
to install things into a python virtual environment.
FIRST: if you want tree visualizations from ete3, you will need to do something special since we are using pip (rather than conda).
Install PyQt5 (sudo apt-get
on linux, brew install
on mac. For mac, see https://stackoverflow.com/questions/39821177/python-pyqt-on-macos-sierra).
Then run pip install PyQt5
in your virtual environment.
AFTERWARDS: Install the libraries into your virtual environment through the requirements.txt
file.
We use nestly + SCons to run simulations/analyses.
You should install scons outside the virtual environment, for a python 2.* or a 3.5+ environment.
Then activate the virtual environment and then run scons ___
.
You need to create a file constant_paths.py
in the gestalt
folder that provides the paths to the different executables. (mix and bhv distance calculators)
For example:
MIX_PATH = "~/my_path/mix" # download from phylip website
BHV_PATH = "~/my_path/gtp.jar" # download from http://comet.lehman.cuny.edu/owen/code.html
RSPR_PATH = "~/my_path/rspr" # download from https://github.com/cwhidden/rspr
generate_data.py
: make data
restrict_observed_barcodes.py
: restrict to observing the first K alleles
get_parsimony_topologies.py
or get_collapsed_oracle.py
: create tree topologies to fit to
tune_topology.py
: to select the best topology given a set of possible topologies
--obs-file
: Pickle file with observations generated byread_gestalt_data.py
orgenerate_data.py
--topology-file
: Pickle file with candidate topologies fromget_parsimony_topologies.py
--out-model-file
: Name of output file.--log-file
: Name of log file--branch-pen-params
: Candidate penalty parameters for penalizing differences between branch lengths. We will tune over these using a variant of cross-validation.--target-lam-pen-params
: Candidate penalty parameters for penalizing differences between target cut rates. We will tune over these using a variant of cross-validation.--num-penalty-tune-iters
: Number of iterations we should spend on tuning the penalty parameters.--num-penalty-tune-splits
: Number of data splits to create (typically 3 - 5)--max-fit-splits
: Maximum number of data splits to use for fitting and tuning the penalty parameters (typically same value as--num-penalty-tune-splits
, but you can use a smaller number to speed up the computation)--num-chad-tune-iters
: Number of iterations to spend on tuning the tree topology with subtree prune and regraft (SPR) moves.--num-chad-stop
: If we don't change the tree topology for this many steps, then stop tuning the topology.--max-chad-tune-search
: Maximum number of SPR moves to consider at each iteration--max-extra-steps
: Maximum number of hidden cuts we should consider when approximating the likelihood. (A large number means more computation time.)--max-sum-states
: Maximum number of ancestral states to sum over when computing the likelihood. (A large number means more computation time.)--max-iters
: Maximum number of iterations for tuning branch lengths and mutation parameters.--num-inits
: Number of initializations to try when minimizing penalized log likelihood with respect to the branch lengths and mutation parameters
convert_to_newick.py
: output the fitted tree in newick format
fit_chronos.py
: uses Sanderson 2002, the chronos
function in R package ape
The full pipelines are specified in the sconscript files.
To run it, we use scons
. For example: scons simulation_topol_consist/
The sconscript files provides the full pipeline for generating data and analyzing it (see simulation_topol_consist/sconscript
as an example) as well as processing data from GESTALT and analyzing it (see analyze_gestalt/sconscript
as an example).
To run all the tests:
python3 -m unittest
To run specific test module tests/test_me.py:
python3 -m unittest tests.<test_me>
The fitted trees for the two adult fish in the paper is available at gestalt/analyze_gestalt/_output/ADR*_PMLE_tree.json
.