-
Notifications
You must be signed in to change notification settings - Fork 49
Bootstrap analysis
-
- CF table from TICR with credibility intervals
- Bootstrap gene trees from RAxML (same format that ASTRAL uses)
-
Starting topology
Option: we can include the best network to start a percentage of the runs in the best network.
# bootstrap from bucky's credibility intervals for CFs
buckyDat = readtable("bucky-output/1_seqgen.CFs.csv")
bootnet = bootsnaq(net0, buckyDat, hmax=1, nrep=100, filename="snaq/bootsnaq1_buckyCI")
or
# boostrap from raxml's bootstrap gene trees
bootTrees = readBootstrapTrees("astral/BSlistfiles")
bootnet = bootsnaq(net0, bootTrees, hmax=1, nrep=100, filename="/snaq/bootsnaq1_raxmlboot")
To summarize the bootstrap support of the tree edges in the estimated network, we simply extract the major tree (remove all hybrid edges with gamma<0.5), and count the number of times a given edge appears in the bootstrap trees.
BStable, tree1 = treeEdgesBootstrap(bootnet,net1)
where tree1
is the major tree in net1
(the best network estimated with the original data), and BStable
is a data frame with the bootstrap support for each edge.
We can plot this information in the tree (or network) with the following commands. The plot shows the tree edges with its 100& bootstrap support. The last command will only label the edges with bootstrap support less than 100% (none for this example).
plot(tree1, edgeLabel=BStable)
plot(net1, edgeLabel=BStable)
plot(net1, edgeLabel=BStable[BStable[:proportion] .< 1.0, :])
NOTE: BStable
depends on edge numbers in net1
, and the edge
numbers change from session to session. So, if you close Julia, reopen
and reread net1
, the edge numbers will likely change. So, you must
run treeEdgesBootstrap
and plot in the same session.
It is not easy to summarize bootstrap support on networks, because edges do not define splits as they do on trees. That is, it is not easy to match edges across networks.
Each hybrid node is analyzed independently of other hybridizations. That is, all other hybrid edges with gamma<0.5 are removed from the network.
We study the relationship of three types of clades:
- hybrid clade: hardwired cluster (descendants) of either hybrid edge
- major sister clade: hardwired cluster of the sibling edge of the major hybrid edge
- minor sister clade: hardwired cluster of the sibling edge of the minor hybrid edge
We compute frequencies for clades being the hybrid clade (with accompanying sister clades), and being sister clades (major or minor). The clade frequencies can be associated to a node or an edge, and we show both options in a plot.
BSn, BSe, BSc, BSgam, BSedgenum = hybridBootstrapSupport(bootnet, net1);
BSn # bootstrap frequencies associated to nodes
BSe # bootstrap frequencies associated to edges
BSc # makeup of all clades
BSc[:taxa][BSc[:H7]] # list of taxa in this clade
BSgam # array of gamma values
minimum(BSgam[:,2])
maximum(BSgam[:,2])
mean(BSgam[:,2])
std(BSgam[:,2])
Percentage of bootstrap trees with an edge from the same sister clade to the same hybrid clade:
plot(net1, edgeLabel=BSe[[:edge,:BS_hybrid_edge]])
Bootstrap support for the full reticulation relationships in the network, one at each hybrid node (support for same hybrid with same sister clades)
plot(net1, nodeLabel=BSn[[:hybridnode,:BS_hybrid_samesisters]])
This means that in 93% of the bootstrap networks, we have the same reticulation relationship with clade "3" as hybrid clade, clade "5" as minor sister clade and clade "4" as major sister clade.
Bootstrap support for hybrid clades, shown on the parent edge of each node with positive hybrid support
plot(net1, edgeLabel=BSn[BSn[:BS_hybrid].>0, [:edge,:BS_hybrid]])
This means that in 93% of bootstrap networks, the clade "3" is the hybrid clade; in 2%, the clade "5" is the hybrid clade; in 3%, the clade "5,6" is the hybrid clade, and in 2%, the clade "3,4" is the hybrid clade.
PhyloNetworks Workshop
- home
- example data
-
TICR pipeline:
from sequences to quartet CFs
- the data
- MrBayes on all genes
- BUCKy
- Quartet MaxCut
- RAxML & ASTRAL
- PhyloNetworks: from quartet CFs or gene trees to phylogenetic networks
- TICR test: is a population tree with ILS sufficient (vs network)?
- Continuous trait evolution on a network