-
Notifications
You must be signed in to change notification settings - Fork 49
SNaQ
To run SNaQ, you need
-
data extracted from sequence alignments:
- a list of estimated unrooted gene trees (e.g. from RAxML), or
- a table of concordance factors (CF) (e.g. from BUCKy)
-
a starting topology (e.g. from Quartet MaxCut or ASTRAL, or RAxML tree from a single gene...)
List of estimated unrooted gene trees: plain text file with one tree per line
in parenthetical format.
Table of CF: plain text file or comma-separated file, obtained from TICR pipeline.
Starting topology: tree or network in parenthetical format
SNaQ estimates unrooted explicit networks with a maximum pseudolikelihood approach.
Read the data into Julia:
buckyCF = readTableCF("bucky-output/1_seqgen.CFs.csv")
or
raxmlCF = readTrees2CF("raxml/besttrees.tre", writeTab=false, writeSummary=false)
For the starting topology, the Quartet MaxCut tree is saved in one file, whereas the ASTRAL tree is saved at the end of a list of bootstrap trees:
tre = readTopology("bucky-output/1_seqgen.QMC.tre")
or
astraltree = readMultiTopology("astral/astral.tre")[102] # main tree with BS as node labels
Estimate the best network from bucky's quartet CF and hmax
number of hybridizations:
net0 = snaq!(tre,buckyCF,hmax=0, filename="snaq/net0_bucky")
net1 = snaq!(net0,buckyCF,hmax=1, filename="snaq/net1_bucky")
net2 = snaq!(net1,buckyCF,hmax=2, filename="snaq/net2_bucky")
By default, each optimization will do 10 runs (runs=10
).
To estimate the best network from raxml's gene trees, the command is the same, except
for the input data:
net1r = snaq!(net0,raxmlCF, hmax=1, filename="snaq/net1_raxml")
We can see the estimated networks with plot:
plot(net0)
plot(net1)
plot(net2)
plot(net1, showGamma=true)
-
.out
file: contains the best network among all runs, and the best network per run, includes also the pseudolikelihood score and the computation time. -
.networks
file: contains the networks obtained by switching the hybrid node inside each cycle (because of potential identifiabiliy issues). -
.log
file: description of each run, convergence criterion, seed information -
.err
file: seed information on runs that failed.Example:
Total errors: 1 in seeds [4545]
, you will need to run the following function on the same settings that caused the error:snaqDebug(tree,data,seed=4545)
.
- For big datasets, do only one run first to get an idea of computing time:
net1 = snaq!(net0,buckyCF,hmax=1, filename="snaq/net1_bucky", runs=1)
- Increase the number of hybridizations sequentially:
hmax=0,1,2,...
, and use the best network ath-1
as starting point to estimate the best network ath
. - For long jobs, run as a script in the terminal:
julia myscript.jl
, arguments are saved as a vector calledARGS
The network does not make any sense!
Recall:
- The root is meaningless, use the rooting functions
- Identifiability issues: check out the
.networks
file, and its pseudolikelihood scores.
vnet = readMultiTopology("snaq/net1_bucky.networks")
- Level-1 assumption: no intersecting cycles, so maybe no more hybridizations can be added if
number of taxa is small
- The bigger the network to estimate, the more genes are needed (compare bootstrap results between 30 genes and 300 genes).
- General instructions here
- Inside Julia:
?plot
- PhyloNetworks google group
- Issues in Github
- Email claudia@stat.wisc.edu
Next: how to choose the number of reticulations.
PhyloNetworks Workshop
- home
- example data
-
TICR pipeline:
from sequences to quartet CFs
- the data
- MrBayes on all genes
- BUCKy
- Quartet MaxCut
- RAxML & ASTRAL
- PhyloNetworks: from quartet CFs or gene trees to phylogenetic networks
- TICR test: is a population tree with ILS sufficient (vs network)?
- Continuous trait evolution on a network