-
Notifications
You must be signed in to change notification settings - Fork 10
Theoretical Background
We can use a graphical model called a Markov Random Field to encode interdepencies of positions in a protein family's multiple sequence alignment.
Single-position amino acid preferences are encoded in single emission potentials
We also encode the preference of two amino acids
From this model, we can define the probability of observing the amino acid
\[ \log P(x_i = a|\vec v, \vec w, (x_1 \ldots x_L \setminus x_i)) \propto v_i(a) + \sum_{j=1 \atop j \ne i}^L w_{i,j}(a, x_j) \]
Using this probability model, we can use Gibbs sampling to draw new sequences from a starting point.
Now that we have a strategy for evolving sequences from a parental sequence, we can use a phylogenetic tree to encode evolutionary relationships between sequences. Starting from a starting sequence
CCMgen provides several choices for phylogenies to evolve along.
If no phylogenetic dependencies between ancestral sequences is desired, you can choose to use a 'star-shaped' phylogenetic tree with a evolutionary distance long enough to ensure that sequences are drawn independently.
You can provide a phylogenetic tree in Newick (.dnd) format to use it for evolving sequences.
Finally, you can evolve from a binary tree as well.