Skip to content
Cecile Ane edited this page Jul 6, 2018 · 7 revisions

A faster alternative to running MrBayes on each gene and BUCKy is to run instead RAxML on each gene (with bootstrap for later use), followed by counting the number of gene trees that have each quartet (which will be done within PhyloNetworks).

The advantage of MrBayes+BUCKy is that uncertainty in gene trees is reduced and integrated out to estimate quartet concordance factors. But it's slower.

To run RAxML on each gene, we can use the script raxml.pl in the script/ folder. To do so, still from the data folder baseline.gamma0.3_n300/, we first uncompress the tarball that has the alignments in nexus format. The script won't do this for us unfortunately. Also, to keep (thousands of) files organized, we uncompress these files in a folder nexus/ within our input/ folder:

mkdir input/nexus
tar -xzf input/1_seqgen.tar.gz -C input/nexus

We can then run the script, which is not parallelized by the way and cannot send jobs to a cluster (unlike the scripts from TICR). RAxML sends a lot of output to the screen, 300 times if we have 300 genes, so we redirect this overwhelming output (and any possible error message) to a file mylog like this:

../scripts/raxml.pl --seqdir=input/nexus --raxmldir=raxml --astraldir=astral > mylog 2>&1 &

Let's check that everything ran smoothly and that we have our desired output. We have a new raxml/ directory with a bunch of things, including one file containing the best tree from each gene:

$ ls raxml/
besttrees.tgz	besttrees.tre	bootstrap	contrees.tgz	raxml.pl.log

$ head -n 4 raxml/besttrees.tre
((3:0.03722741374604016107,((1:0.00200161809596385333,2:0.00201350363534626934):0.02244234032100871773,4:0.01019471115416328323):0.00892507159858969140):0.01108038091781409287,5:0.05741592338682129787,6:0.06219449652861407801):0.0;
(((2:0.01227567249720099701,1:0.01840652224788371163):0.01469961352294601314,(3:0.00599379238893725259,4:0.00409652386305139546):0.02927896686177304530):0.00889065450733303973,5:0.03875872627548612032,6:0.07170201496079345316):0.0;
(((4:0.01028714366807356555,3:0.00183212712076471769):0.01214201212353699204,(1:0.00403084968085533086,2:0.00605548655541476161):0.01076366509971924193):0.01687768622690864057,5:0.06258702709089380978,6:0.08608073372761951281):0.0;
((1:0.00390601028837478034,2:0.01433216163343447279):0.02778992481020635397,((4:0.01423422190143536215,3:0.00409243847219765196):0.03425648595931906487,5:0.03106362878596600346):0.01154330379556680469,6:0.04409154556286316168):0.0;

We also have a new folder raxml/boostrap/ containing one bootstrap tree file per gene:

$ ls raxml/bootstrap/ | head
RAxML_bootstrap.1_seqgen1
RAxML_bootstrap.1_seqgen10
RAxML_bootstrap.1_seqgen100
RAxML_bootstrap.1_seqgen101
RAxML_bootstrap.1_seqgen102
RAxML_bootstrap.1_seqgen103
RAxML_bootstrap.1_seqgen104
RAxML_bootstrap.1_seqgen105
RAxML_bootstrap.1_seqgen106
RAxML_bootstrap.1_seqgen107

and we have a new folder astral/ that has the results from ASTRAL (which the script runs after RAxML by default):

$ ls astral/
BSlistfiles		astral.screenlog	astral.tre

Note: the astral command in the script assumes that there is a single individual per species. If your data has multiple alleles or individual per species, ASTRAL will need to be given a mapping file, to know which individuals map to which species. In this case, we can run raxml.pl without the final astral step:

../scripts/raxml.pl --seqdir=input/nexus --raxmldir=raxml --nodoastral > mylog 2>&1 &

The script will still print at the end of its log file which ASTRAL command it would have run, but did not actually run. Copy this command (which will have the correct path to the input files for ASTRAL), and add to it the option that gives the mapping file. Then just run it 'by hand'.

In our example, we had a single individual per species, so the ASTRAL command that the script ran was just what we needed. In the output .tre file, we have 100 bootstrap species trees, and another 2 extra trees: the original species tree (on the original gene trees) twice: first with edges annotated with bootstrap values, second with edges annotated with bootstrap values and edge lengths (in coalescent units).

$ head -3 astral/astral.tre 
(3,(4,((5,6)1:0.5764721782579237,(2,1)1:2.483246898369636)1:1.154550029033082)); 
(3,(4,((5,6)1:0.6311555045319776,(2,1)1:2.68060633252813)1:1.1234191104379085)); 
(3,(4,((5,6)1:0.6172017897581119,(2,1)1:2.68060633252813)1:1.1157854855828375)); 

$ tail -n 5 astral/astral.tre 
(3,(4,((5,6)1:0.6405673387143238,(2,1)1:2.610402073854883)1:1.1082096917743796)); 
(3,(4,((5,6)1:0.5545414747639535,(2,1)1:2.68060633252813)1:1.0932281381587627)); 
(3,(4,((5,6)1:0.5898652355943617,(2,1)1:2.7561138850362754)1:1.1006908593603526)); 
(3,(4,((5,6)100.0,(1,2)100.0)100.0)); 
(3,(4,((5,6)100.0:0.6548532959618005,(2,1)100.0:2.6448882499260518)100.0:1.1388634328653822)); 

In this astral/ folder, we also have the list of all bootstrap RAxML files, which was needed for ASTRAL, and which will be reused in PhyloNetworks to get bootstrap species networks:

$ head astral/BSlistfiles 
raxml/bootstrap/RAxML_bootstrap.1_seqgen1
raxml/bootstrap/RAxML_bootstrap.1_seqgen10
raxml/bootstrap/RAxML_bootstrap.1_seqgen100
raxml/bootstrap/RAxML_bootstrap.1_seqgen101
raxml/bootstrap/RAxML_bootstrap.1_seqgen102
raxml/bootstrap/RAxML_bootstrap.1_seqgen103
raxml/bootstrap/RAxML_bootstrap.1_seqgen104
raxml/bootstrap/RAxML_bootstrap.1_seqgen105
raxml/bootstrap/RAxML_bootstrap.1_seqgen106
raxml/bootstrap/RAxML_bootstrap.1_seqgen107

Now that we got all our results with no error, we can remove the uninteresting output from RAxML in mylog:

rm mylog

Next: from quartet CFs or gene trees to phylogenetic networks with PhyloNetworks.

PhyloNetworks Workshop

Clone this wiki locally