Skip to content
Cecile Ane edited this page Jun 7, 2016 · 21 revisions

Types of input

  • data from sequence alignment that capture uncertainty:

    • credibility intervals for quartet concordance factors, from TICR
    • bootstrap gene trees from RAxML (same format that ASTRAL uses)
  • a starting topology

Option: we can include a second network topology, to serve as the starting topology for some percentage of runs when searching for the best network, for each bootstrap replicate.

Bootstrap analysis

# bootstrap from bucky's credibility intervals for CFs
buckyDat = readtable("bucky-output/1_seqgen.CFs.csv")
bootnet = bootsnaq(net0, buckyDat, hmax=1, nrep=100, filename="snaq/bootsnaq1_buckyCI")

or

# boostrap from raxml's bootstrap gene trees
bootTrees = readBootstrapTrees("astral/BSlistfiles")
bootnet = bootsnaq(net0, bootTrees, hmax=1, nrep=100, filename="/snaq/bootsnaq1_raxmlboot")

Bootstrap summary

Tree edges

To summarize the bootstrap support of the tree edges in the estimated network, we simply extract the major tree (remove all hybrid edges with γ<0.5) from the reference network and from each bootstrap network, and then count the number of times a given edge appears in the bootstrap trees.

BStable, tree1 = treeEdgesBootstrap(bootnet,net1)

where tree1 is the major tree in net1 (the best network estimated with the original data), and BStable is a data frame with the bootstrap support for each edge.

We can plot this information on the tree (or network). The image below shows the networks, and all its tree edges have 100% bootstrap support. The last command will only label the edges with bootstrap support less than 100% (if any, in other examples).

plot(tree1, edgeLabel=BStable)
plot(net1,  edgeLabel=BStable)
plot(net1, edgeLabel=BStable[BStable[:proportion] .< 1.0, :])


NOTE: BStable depends on edge numbers in net1, and the edge numbers change from session to session. So, if you close Julia, reopen and reread net1, the edge numbers will likely change. So, you must run treeEdgesBootstrap and plot in the same session.

Hybrid edges

It is not easy to summarize bootstrap support on networks, because edges do not uniquely define splits like they do on trees. That is, it is not easy to match edges across networks.

Each hybrid node is analyzed independently of other hybridizations. That is, all other hybrid edges with gamma<0.5 are removed from the network.

We study the relationship of three types of clades:

  • hybrid clade: hardwired cluster (descendants) of either hybrid edge
  • major sister clade: hardwired cluster of the sibling edge of the major hybrid edge
  • minor sister clade: hardwired cluster of the sibling edge of the minor hybrid edge

We compute frequencies for clades being a hybrid clade (with accompanying sister clades), and being sister clades (major or minor). The clade frequencies can be associated to a node or to an edge, and we show both options in a plot.

BSn, BSe, BSc, BSgam, BSedgenum = hybridBootstrapSupport(bootnet, net1);
BSn # bootstrap frequencies associated to nodes
BSe # bootstrap frequencies associated to edges
BSc # makeup of all clades
BSc[:taxa][BSc[:H7]] # list of taxa in the clade named "H7"
BSgam # array of gamma values
minimum(BSgam[:,2])
maximum(BSgam[:,2])
mean(BSgam[:,2])
std(BSgam[:,2])

Percentage of bootstrap trees that have the same sister-hybrid relationship as in the reference network, i.e. an edge from the same sister clade to the same hybrid clade:

plot(net1, edgeLabel=BSe[[:edge,:BS_hybrid_edge]])

Bootstrap support for the full reticulation relationships in the network, one at each hybrid node (support for same hybrid with same sister clades)

plot(net1, nodeLabel=BSn[[:hybridnode,:BS_hybrid_samesisters]])


This means that in 93% of the bootstrap networks, we have the same reticulation relationship with taxon "3" as hybrid clade, taxon "5" as one sister clade (either minor or major) and taxon "4" the other sister clade. In this example, each of these clades is made up of a single taxon, but that need not be the case in general.

We can also plot the bootstrap support for hybrid clades. Here, tt is shown on the parent edge of each node with positive hybrid support

plot(net1, edgeLabel=BSn[BSn[:BS_hybrid].>0, [:edge,:BS_hybrid]])


This means that the taxon "3" is a hybrid clade in 93% of bootstrap networks; clade "3,4" is a hybrid clade in 2% of bootstrap networks, clade "5,6" is a hybrid clade in 3% of bootstrap networks and taxon "5" in 2% of bootstrap networks.

Next: formal TICR test to test a tree with ILS only.

PhyloNetworks Workshop

Clone this wiki locally