Skip to content
Claudia Solis-Lemus edited this page Jun 6, 2016 · 33 revisions

Types of input

To run SNaQ, you need

    • a list of estimated gene trees (e.g. from RAxML), or
    • a table of concordance factors (CF) (e.g. from BUCKy)
  • starting topology (e.g. from Quartet MaxCut)

List of estimated gene trees: plain text file with one tree per line in parenthetical format.

Table of CF: plain text file or comma-separated file, obtained from TICR pipeline.

Starting topology: tree or network in parenthetical format

Network estimation

SNaQ estimates unrooted explicit networks with a maximum pseudolikelihood approach.

Read the data into Julia:

buckyCF = readTableCF("bucky-output/1_seqgen.CFs.csv")
tre = readTopology("bucky-output/1_seqgen.QMC.tre")

or

raxmlCF = readTrees2CF("raxml/besttrees.tre", writeTab=false, writeSummary=false)
astraltree = readMultiTopology("astral/astral.tre")[102] # main tree with BS as node labels

Estimate best network for hmax number of hybridizations, and run=10 by default.

net0 = snaq!(tre,buckyCF,hmax=0, filename="snaq/net0")
net1 = snaq!(net0,buckyCF,hmax=1, filename="snaq/net1")
net2 = snaq!(net1,buckyCF,hmax=2, filename="snaq/net2")

We can see the estimated networks with plot:

plot(net0)
plot(net1)
plot(net2)


The hybrid edges are displayed in blue: dark blue for the major edge, and light blue for the minor edge.

Output files

  • .out file: contains the best network among all runs, and the best network per run, includes also the pseudolikelihood score and the computation time.

  • .networks file: contains the networks obtained by switching the hybrid node inside each cycle (because of identifiabiliy issues).


  • .log file: description of each run, convergence criterion, seed information

  • .err file: seed information on runs that failed.

Example: Total errors: 1 in seeds [4545], you will need to run the following function on the same settings that caused the error: snaqDebug(tree,data,seed=4545).

Suggestions

  • For big datasets, do only one run first to get an idea of computing time:
net1 = snaq!(net0,buckyCF,hmax=1, filename="snaq/net1", runs=1)
  • Increase the number of hybridizations sequentially: hmax=0,1,2,..., and use the best network at h-1 as starting point to estimate the best network at h.
  • For long jobs, run as a script in the terminal: julia myscript.jl, arguments are saved as a vector called ARGS

Troubleshooting

The network does not make any sense!

Recall:

  • The root is meaningless, use the rooting functions
  • Identifiability issues: check out the .networks file, and its pseudolikelihood scores.
vnet = readMultiTopology("snaq/net1.networks")
  • Level-1 assumption: maybe no more hybridizations can be added if number of taxa is small
  • The bigger the network to estimate, the more genes are needed (compare bootstrap results between 30 genes and 300 genes).

Documentation

  • General instructions here
  • Inside Julia: ?plot
  • readthedocs.org (still need to add)

Help/bugs/errors

PhyloNetworks Workshop

Clone this wiki locally