Skip to content
Cecile Ane edited this page Jun 9, 2016 · 33 revisions

Types of input

To run SNaQ, you need

  • data extracted from sequence alignments:

    • a list of estimated unrooted gene trees (e.g. from RAxML), or
    • a table of concordance factors (CF) (e.g. from BUCKy)
  • a starting topology (e.g. from Quartet MaxCut or ASTRAL, or RAxML tree from a single gene...)

List of estimated unrooted gene trees: plain text file with one tree per line in parenthetical format.

Table of CF: plain text file or comma-separated file, obtained from TICR pipeline.

Starting topology: tree or network in parenthetical format

Network estimation

SNaQ estimates unrooted explicit networks with a maximum pseudolikelihood approach.

Read the data into Julia:

buckyCF = readTableCF("bucky-output/1_seqgen.CFs.csv")

or

raxmlCF = readTrees2CF("raxml/besttrees.tre", writeTab=false, writeSummary=false)

For the starting topology, the Quartet MaxCut tree is saved in one file, whereas the ASTRAL tree is saved at the end of a list of bootstrap trees:

tre = readTopology("bucky-output/1_seqgen.QMC.tre")

or

astraltree = readMultiTopology("astral/astral.tre")[102] # main tree with BS as node labels

Estimate the best network from bucky's quartet CF and hmax number of hybridizations:

net0 = snaq!(tre,buckyCF,hmax=0, filename="snaq/net0_bucky")
net1 = snaq!(net0,buckyCF,hmax=1, filename="snaq/net1_bucky")
net2 = snaq!(net1,buckyCF,hmax=2, filename="snaq/net2_bucky")

By default, each optimization will do 10 runs (runs=10). To estimate the best network from raxml's gene trees, the command is the same, except for the input data:

net1r = snaq!(net0,raxmlCF, hmax=1, filename="snaq/net1_raxml")

We can see the estimated networks with plot:

plot(net0)
plot(net1)
plot(net2)
plot(net1, showGamma=true)


Output files

  • .out file: contains the best network among all runs, and the best network per run, includes also the pseudolikelihood score and the computation time.

  • .networks file: contains the networks obtained by switching the hybrid node inside each cycle (because of potential identifiabiliy issues).


  • .log file: description of each run, convergence criterion, seed information

  • .err file: seed information on runs that failed.

    Example: Total errors: 1 in seeds [4545], you will need to run the following function on the same settings that caused the error: snaqDebug(tree,data,seed=4545).

Suggestions

  • For big datasets, do only one run first to get an idea of computing time:
net1 = snaq!(net0,buckyCF,hmax=1, filename="snaq/net1_bucky", runs=1)
  • Increase the number of hybridizations sequentially: hmax=0,1,2,..., and use the best network at h-1 as starting point to estimate the best network at h.
  • For long jobs, run as a script in the terminal: julia myscript.jl, arguments are saved as a vector called ARGS

Troubleshooting

The network does not make any sense!

Recall:

  • The root is meaningless, use the rooting functions
  • Identifiability issues: check out the .networks file, and its pseudolikelihood scores.
vnet = readMultiTopology("snaq/net1_bucky.networks")
  • Level-1 assumption: no intersecting cycles, so maybe no more hybridizations can be added if number of taxa is small
  • The bigger the network to estimate, the more genes are needed (compare bootstrap results between 30 genes and 300 genes).

Documentation

  • General documentation here
  • Inside Julia: ?plot

Help/bugs/errors

Next: how to choose the number of reticulations.

PhyloNetworks Workshop

Clone this wiki locally