torchtree is a program designed for developing and inferring phylogenetic models. Implemented in Python, it leverages PyTorch for automatic differentiation. The suite of inference algorithms encompasses variational inference, Hamiltonian Monte Carlo, maximum a posteriori, and Markov chain Monte Carlo.
For a comprehensive assessment of torchtree's performance and use cases, please see our evaluation repository, torchtree-experiments, where torchtree was rigorously tested on various datasets and benchmarked for accuracy and speed.
Use an Anaconda environment (Optional)
conda env create -f environment.yml
conda activate torchtree
To install the latest stable version you can run
pip install torchtree
To build torchtree from source you can run
git clone https://github.com/4ment/torchtree
pip install torchtree/
Check install
torchtree --help
For detailed information on how to use torchtree
and its features, please refer to the official documentation and API reference.
torchtree
requires a JSON file containing models and algorithms. A configuration file can be generated using torchtree-cli
, a command line-based tool. This two-step process allows the user to adjust values in the configuration file, such as hyperparameters.
torchtree-cli
implements several subcommands, each corresponding to a different type of inference algorithm.
A list of available subcommands can be obtained by running torchtree-cli --help
.
The following subcommands are available:
advi
: Automatic differentiation variational inferencehmc
: Hamiltonian Monte Carlomap
: Maximum a posteriorimcmc
: Markov chain Monte Carlo
Each subcommand/algorithm requires a different set of arguments which can be obtained by running torchtree-cli <subcommand> --help
.
torchtree-cli
requires an alignment file in FASTA format and a tree file in either Newick or NEXUS format.
While torchtree uses the DendroPy library to parse and manipulate phylogenetic trees, it is recommended to use a Newick file due to the numerous variations of the NEXUS format.
Let's explore a few examples of how to use these programs using an influenza A virus dataset containing 69 DNA sequences. The alignment and tree files are located in the data directory.
Some examples of models using variational inference:
W4 refers to a site model with 4 rates categories coming from a discretized Weibull distribution. This is similar to the more commonly used discretized Gamma distribution site model.
torchtree-cli advi -i data/fluA.fa -t data/fluA.tree -m GTR -C 4 > fluA.json
torchtree-cli advi -i data/fluA.fa -t data/fluA.tree -m JC69 --clock strict --coalescent constant > fluA.json
This will generate sample.csv
and sample.trees
files containing parameter and tree samples drawn from the variational distribution
torchtree fluA.json
torchtree can be easily extended without modifying the code base thanks its modular implementation. Some examples of plug-ins:
A GitHub template is available to assist in the development of a plug-in, and it is highly recommended to use it. This template provides a structured starting point, ensuring consistency and compatibility with torchtree
while streamlining the development process.
If you use torchtree, please consider citing:
@misc{fourment2024torchtree,
title={torchtree: flexible phylogenetic model development and inference using {PyTorch}},
author={Mathieu Fourment and Matthew Macaulay and Christiaan J Swanepoel and Xiang Ji and Marc A Suchard and Frederick A Matsen IV},
year={2024},
eprint={2406.18044},
archivePrefix={arXiv},
primaryClass={q-bio.PE},
url={https://arxiv.org/abs/2406.18044}
}
Distributed under the GPLv3 License. See LICENSE for more information.
torchtree makes use of the following libraries and tools, which are under their own respective licenses: