Skip to content

Control file MSci

Pas-Kapli edited this page Dec 2, 2021 · 17 revisions

The control file for the parameter estimation under the MSC or MSCi (A00) is similar to the previous exercise. It differs in three main points: Analysis mode, species&tree, and for the MSCi model the addition of the phiprior.

1. Analysis mode

For this analysis we want both the species tree and the assignment of samples to species to be fixed.

To achieve that we use the following two lines in the control file:

   speciesdelimitation = 0
   speciestree = 0

2. species&tree

For this section, we would need to provide the species relationships. Under an introgression hypothesis, the species relationships are not described by a purely binary tree since horizontal relationships among lineages are also allowed. The representation of these relationships in newick format can easily get complicated.

We can create the newick tree using bpp itself under the --msci-create flag and a description of the relationships in a text file.

In the yeast example, we would do that as follows:

tree ((((Scer,Spar),Smik),Skud),Sbay);

# define labels on the ancestral nodes of the tree. 
# We define each internal node by providing two tips for which this node is their MRCA. 
define A as Scer,Spar
define B as Scer,Smik
define C as Scer,Skud
define R as Scer,Sbay

# Define a hybridization event. 
# Here we define that there is a horizontal relationship from the branch "Skud C" to the "Sbay R".
# We name H the node at the receiving lineage(Sbay R) and D the node at the source lineage (Skud C).
# Next, by defining "tau=yes,no" which means that H and D have the same tau, that of H. 
hybridization Sbay R, Skud C as H D tau=yes,no phi=0.4

A txt version of the file above is here

wget https://raw.githubusercontent.com/Pas-Kapli/bpp-tutorial/master/A00_Yeast/data/msci_net.txt

To create the network in newick under this description we execute:

bpp --msci-create msci_net.txt

The extended newick representation of the network is:

((((Scer,Spar)A,Smik)B,(H[&phi=0.600000,tau-parent=no],Skud)D)C, (Sbay)H[&phi=0.400000,tau-parent=yes])R;

The annotations in the square brackets [...] are used to define the characteristics of the hybridization events. Each annotation is relative to the corresponding parent of the node defined on. Here, the first annotation (reading from left to right) is defined on inner node H and relates to the parent node D. The attribute tau-parent=no dictates that parent node D will not have its own tau parameter, but instead, share the same tau with node H.

Here, is the illustration:

yeast species network

For further information about the extended newick notation check here and here.

We copy the newick tree in the species&tree parameter and fill in the rest of the information:

  species&tree = 5 Scer Spar Smik Skud Sbay
                   1    1    1    1    1
       ((((Scer,Spar)A,Smik)B,(H[&phi=0.600000,tau-parent=no],Skud)D)C, (Sbay)H[&phi=0.400000,tau-parent=yes])R;

3. phiprior

The phi parameter specifies a beta prior beta(a, b) for phi, i.e., the introgression probability under the MSci model.

Since we don't have other information regarding the phi parameter for the yeast species we will use a beta distribution that assigns an equal probability to all values from 0 to 1.

  phiprior = 1 1

Final control file

Assembling all the information in the control file (check previous pages for the "Input/Output" and "MCMC sampling parameters") here would look like this:

Take a note at the definition of the theta parameter. The purpose of the analysis is to calculate the MSci parameters, among which is the theta. To achieve that we add the "e" after the alpha and beta of the prior, i.e., "thetaprior = 3 0.04 e"

      seed =  -1

      seqfile = ../data/bpp_seqfile.txt
      Imapfile = ../data/Imap.txt
      outfile = out.txt
      mcmcfile = mcmc.txt

      speciesdelimitation = 0   * fixed species delimitation
      speciestree = 0           * fixed species tree

      species&tree = 5 Scer Spar Smik Skud Sbay
                        1    1    1    1    1
                   ((((Scer,Spar)A,Smik)B,(H[&phi=0.600000,tau-parent=no],Skud)D)C, (Sbay)H[&phi=0.400000,tau-parent=yes])R;


      usedata = 1      * 0: no data (prior); 1:seq like
      nloci = 106      * number of data sets in seqfile
      cleandata = 0    * remove sites with ambiguity data (1:yes, 0:no)
      model = JC69

      thetaprior = 3 0.04 e  # invgamma(a, b) for theta. With "e" theta parameters are estimated
      tauprior = 3 0.2       # invgamma(a, b) for root tau & Dirichlet(a) for other tau's
      phiprior = 1 1         # beta(a, b) for phi in the MSci model
      
      finetune = 1
      print = 1 0 0 0   * MCMC samples, locusrate, heredityscalars Genetrees
      burnin = 8000
      sampfreq = 2
      nsample = 50000

Download the control file from this link or with wget:

wget https://raw.githubusercontent.com/Pas-Kapli/bpp-tutorial/master/A00_Yeast/data/A00.bpp.ctl