Skip to content

Choice of number of hybridizations

Cecile Ane edited this page Jul 6, 2018 · 7 revisions

Model selection tools are necessary to estimate the number of hybridizations (h). We use the log pseudolikelihood profile with h. A sharp improvement is expected until h reaches the best value and a slower, linear improvement thereafter.

Unfortunately, there is no clear-cut criterion to determine what constitutes a large and 'significant' improvement in the pseudolikelihood score. Even with a regular likelihood, AIC and BIC are not quite appropriate, because they are not meant to accommodate the exploding number of models when h gets bigger (the network space grows very fast). That being said, the pattern of score improvement can be used heuristically.

The negative log pseudolikelihood score is an attribute of the network object: net.loglik. The lower the better.

scores = [net0.loglik, net1.loglik, net2.loglik]
using Gadfly
plot(x=collect(0:2), y=scores, Geom.point, Geom.line)

Below are examples from 2 different data sets.



Next: perform and summarize a bootstrap analysis

PhyloNetworks Workshop

Clone this wiki locally