Overview of the MERLIN (Mitocondrial EvolutionaRy Lineage INference) algorithm
paper: https://academic.oup.com/bioinformatics/article/40/Supplement_1/i218/7700844
- python3 (>=3.6)
- numpy
- pandas
- gurobipy (academic license required, which can be obtained here with educational account.)
- networkx
- (optional for generating simulation instances and benchmarking) snakemake (>=5.2.0)
The input for MERLIN are CSV files containing the total read counts, and the variant read counts. Both matrices should have mutations as the rows and cells as columns.
It is important that the format matches the example input files total_matrix.csv
and variant_matrix.csv
given in data/example
, which can be generated by the following command.
mkdir data/example/
python src/simulation.py -n 50 -m 5 -g 5 -c 50 -o data/example/
usage: simulation.py -m n_mutation -n n_cells -g n_clones -c coverage [-t threshold] -o O
optional arguments:
-m, --help show this help message and exit
-n, --total csv file with total read count matrix
-g, --variant csv file with variant read count matrix
-c, --coverage expected sequencing coverage for simulated data
-t, --threshold minimum variant allele frequency (default 0.05)
-o, --out output directory
variant matrix.txt
/total_matrix.txt
: input to MERLINtree.txt
: groundtruth clone treecell_tree.txt
: groundtruth cell lineage treecell_to_clone_mapping.txt
mutation_to_clone_mapping.txt
usage: merlin.py [-h] [-t T] [-v V] -o O
optional arguments:
-h, --help show this help message and exit
-t, --total csv file with total read count matrix
-v, --variant csv file with variant read count matrix
-o, --out output prefix
An example of usage is as follows.
$ python src/merlin.py -t data/example/total_matrix.csv -v data/example/variant_matrix.csv -o data/example/
MERLIN produces the below files as output:
- The inferred clone tree
$S$ , as{output_prefix}_clone_tree_edge_list.txt
- The
$U$ -matrix from the factorization$F=UB$ , as{output_prefix}_Umatrix.csv
- The binarized
$U$ -matrix as{output_prefix}_Amatrix.csv
- The ancestral graph
$G$ inferred from frequency matrix$F$ as{output_prefix}_ancestry_edge_list.txt
An example output for the example input above can be found in data/example
We recommend using the following pipeline described in MQuad to select informative mitochondrial variants.
Note that MERLIN has a reasonable run time (< 3 hours) for