Skip to content

4ment/torchtree

Repository files navigation

torchtree

Python package License: GPL v3 docs PyPI PyPI - Python Version

torchtree is a program designed for developing and inferring phylogenetic models. Implemented in Python, it leverages PyTorch for automatic differentiation. The suite of inference algorithms encompasses variational inference, Hamiltonian Monte Carlo, maximum a posteriori, and Markov chain Monte Carlo.

For a comprehensive assessment of torchtree's performance and use cases, please see our evaluation repository, torchtree-experiments, where torchtree was rigorously tested on various datasets and benchmarked for accuracy and speed.

Getting Started

Dependencies

Installation

Use an Anaconda environment (Optional)

conda env create -f environment.yml
conda activate torchtree

To install the latest stable version you can run

pip install torchtree

To build torchtree from source you can run

git clone https://github.com/4ment/torchtree
pip install torchtree/

Check install

torchtree --help

Documentation

For detailed information on how to use torchtree and its features, please refer to the official documentation and API reference.

Quick start

torchtree requires a JSON file containing models and algorithms. A configuration file can be generated using torchtree-cli, a command line-based tool. This two-step process allows the user to adjust values in the configuration file, such as hyperparameters.

torchtree-cli implements several subcommands, each corresponding to a different type of inference algorithm. A list of available subcommands can be obtained by running torchtree-cli --help.

The following subcommands are available:

  • advi: Automatic differentiation variational inference
  • hmc: Hamiltonian Monte Carlo
  • map: Maximum a posteriori
  • mcmc: Markov chain Monte Carlo

Each subcommand/algorithm requires a different set of arguments which can be obtained by running torchtree-cli <subcommand> --help.

torchtree-cli requires an alignment file in FASTA format and a tree file in either Newick or NEXUS format. While torchtree uses the DendroPy library to parse and manipulate phylogenetic trees, it is recommended to use a Newick file due to the numerous variations of the NEXUS format.

Let's explore a few examples of how to use these programs using an influenza A virus dataset containing 69 DNA sequences. The alignment and tree files are located in the data directory.

1 - Generating a configuration file

Some examples of models using variational inference:

Unrooted tree with GTR+W4 model

W4 refers to a site model with 4 rates categories coming from a discretized Weibull distribution. This is similar to the more commonly used discretized Gamma distribution site model.

torchtree-cli advi -i data/fluA.fa -t data/fluA.tree -m GTR -C 4 > fluA.json

Time tree with strict clock and constant coalescent model

torchtree-cli advi -i data/fluA.fa -t data/fluA.tree -m JC69 --clock strict --coalescent constant > fluA.json

2 - Running torchtree

This will generate sample.csv and sample.trees files containing parameter and tree samples drawn from the variational distribution

torchtree fluA.json

torchtree plug-in

torchtree can be easily extended without modifying the code base thanks its modular implementation. Some examples of plug-ins:

A GitHub template is available to assist in the development of a plug-in, and it is highly recommended to use it. This template provides a structured starting point, ensuring consistency and compatibility with torchtree while streamlining the development process.

How to cite

If you use torchtree, please consider citing:


@misc{fourment2024torchtree,
      title={torchtree: flexible phylogenetic model development and inference using {PyTorch}}, 
      author={Mathieu Fourment and Matthew Macaulay and Christiaan J Swanepoel and Xiang Ji and Marc A Suchard and Frederick A Matsen IV},
      year={2024},
      eprint={2406.18044},
      archivePrefix={arXiv},
      primaryClass={q-bio.PE},
      url={https://arxiv.org/abs/2406.18044}
}

License

Distributed under the GPLv3 License. See LICENSE for more information.

Acknowledgements

torchtree makes use of the following libraries and tools, which are under their own respective licenses: