RootDigger is a program that will, when given a MSA and an unrooted tree with branch lengths place a root on the given tree. For the foreseeable future, RootDigger will only support DNA data, as the method RootDigger uses is ineffective when using AA data.
Currently, the best way to get the most recent version of RootDigger is by cloning the repository
git clone --recursive https://github.com/computations/root_digger
This will obtain all the required dependencies including a modified libpll
.
Furthermore, the particular variant of libpll
that is used by RootDigger
requires the GNU Scientific Library (GSL). Most distributions have packages
available, so installing it should be done through your package manager. In the
case that GSL is not found though, the build process will automatically download
GSL and build it.
Compilation of libpll
requires the dependencies Flex and Bison, both of which
can also be obtained through a package manager.
Root digger requires cmake
to build. There is a makefile
provided which will
set up the build directory and build the software automatically. Once the
software is built, the binary rd
is placed in the bin
directory, along with
rd_test
, which is the test suite.
As of version 1.4, root digger supports a rudimentary version of both thread and process level parallelism. Both are optional. By default, if cmake can find OpenMP, it will build with it.
For MPI, a special build flag needs to be passed. There is a mpi
target for
the makefile, so in most cases running make mpi
should be sufficient. In
cases where this doesn't work, the base command is
cmake -DMPI_BUILD=ON -DCMAKE_BUILD_TYPE=Release
In general, MPI is better to use on larger trees.
./rd --msa <MSA FILE> --tree <TREE FILE>
The MSA file can be in any format that is supported by libpll
, which at the
time of writing is one of: "relaxed" phylip or fasta. The tree file should
contain a metric tree (this is to say, branch lengths which are in expected
substitutions per site) in newick format.
By default, RootDigger runs in search mode with early stopping on. This means that RootDigger will simply look for the most likely root, and will "stop early". This means that RootDigger will consider the search concluded when it finds the same root placement twice in a row (as opposed to requiring that the likelihood is the same twice in a row).
RootDigger can be run in exhaustive mode using --exhaustive
. This will cause
RootDigger to consider every branch, and report the Likelihood Weight Ratio of
placing the root on that branch. Informally, this can be interpreted as the
probability of placing the root on a given branch. By default, --exhaustive
mode does not run with early stopping. This can be enabled using
--early-stop
. In practice, this doesn't affect the results at all, but in
principle it could, so be warned.
For more information about the options, there is a --help
flag which will
print detailed information about all the options.
Application Options:
--msa [FILE]
File containing the alignment.
--tree [FILE]
File containing the tree, with branch lengths.
--partition [FILE]
Optional file containing the partition specification.
Format is the same as RAxML-NG partition file.
--exhaustive
Enable exhaustive mode. This will attempt to root a tree
at every branch, and then report the results using LWR.
--early-stop
Enable early stopping. This will cause cause the search
to terminate when the root placement is sufficently
close for 2 consecutive iterations. How close they need
to be is controled by brtol. Is enabled by default for
search mode and disabled by default for exhaustive mode.
--no-early-stop
Force disable early stop.
--seed [NUMBER]
Random seed to use. Optional
--rate-cats [NUMBER]
Number of rate categories to use for the model. Default
is 1.
--invariant-sites
Enable invariant sites. Default is off.
--min-roots [NUMBER]
Minimum number of roots to start from. Optional,
Default is 1.
--root-ratio [NUMBER]
Proportion of potential starting roots to attempt
Default is 0.01
--atol [NUMBER]
Root optmization stopping tolerance. Increase this to
improve results.Default is 1e-4
--brtol [NUMBER]
When early stop mode is enabled, this controls the
distance required to trigger. Default is 1e-12
--bfgstol [NUMBER]
Tolerance for the BFGS steps. Default is 1e-7
--factor [NUMBER]
Factor for the BFGS steps. Default is 1e4
--threads [NUMBER]
Number of threads to use
--silent
Suppress output except for the final tree
--verbose
Increase the verbosity level. Can be repeated to
level further.
Dependencies:
- GSL
- Cmake 3.0
- C++14 compatible compiler
Optional Dependencies:
- OpenMP for thread level parallelism
- Some MPI package for process level parallelism
Usage:
For search mode:
rd --msa <MSA> --tree <TREE>
For exhaustive mode:
rd --msa <MSA> --tree <TREE> --exhaustive