-
Notifications
You must be signed in to change notification settings - Fork 49
The Data
Claudia Solis-Lemus edited this page May 30, 2023
·
7 revisions
The sequence alignments, one for each gene, are in nexus format and bundled in a tarball. We first navigate to the data directory:
$ cd data_results/baseline.gamma0.3_n30/
$ ls input/
1_seqgen.in 1_seqgen.tar.gz
1_seqgen.tar.gz
is a tarball that contains all 30 alignments (30 loci):
$ tar -ztf input/1_seqgen.tar.gz
1_seqgen10.nex
1_seqgen11.nex
1_seqgen12.nex
1_seqgen13.nex
...
1_seqgen6.nex
1_seqgen7.nex
1_seqgen8.nex
1_seqgen9.nex
Let's look at the first alignment in input/1_seqgen.tar.gz/1_seqgen1.nex
.
We can decompress the nexus files into a new folder that we will call nexus
,
then look at the first alignment:
cd input
mkdir nexus
tar -xzvf 1_seqgen.tar.gz -C nexus
ls nexus
cat nexus/1_seqgen1.nex
less -S nexus/1_seqgen1.nex
(type q
to quit viewing the file)
The alignment looks like this, showing only 6 taxa and 500 bp (for faster analyses during the workshop) -- and yes these data were simulated:
#NEXUS
[
Generated by seq-gen Version 1.3.2x
Simulations of 6 taxa, 500 nucleotides
for 30 tree(s) with 1 dataset(s) per tree
Branch lengths of trees multiplied by 0.018
Rate homogeneity of sites.
Model = HKY: Hasegawa, Kishino & Yano (1985)
transition/transversion ratio = 2 (K=4.21179)
with nucleotide frequencies specified as:
A=0.300414 C=0.191363 G=0.196748 T=0.311475
]
Begin DATA; [Tree 1]
Dimensions NTAX=6 NCHAR=500;
Format MISSING=? GAP=- DATATYPE=DNA;
Matrix
6 TTGAAACGGGTAATTTTACTTATCGATTATAAGCATCATACATGATATGGTTGTTTGTTGATGACTTCATAGCTATAAGAGGCATTATAGTATGCATGTTCCGTCAGACTCGCCCACTACAGAGCTATGTAAACAGTGGGGGCTGGTACAACTCCCTACCGATTGAATCTTATAATGGCGTATGATGTTAACGCGCTCTTGAATTGTCTTTTAAGCATAAGGGCTTTGGATAGATTAATCTTGCTTTAAATCACTCTAGCAGAAGCGTACGTTTTAATCAGACATTAACACGTTGTCGATCCATTTCAACACACACTGTTCAGTACCTTGGATCTATAAGATCCATGGGTATACCACATTTGTTGTTGCCGCTTGTGTACCCTGGTGAATGGCGTTAAGACTCCAGAGTAACCTGCTAGCTACACGCATCATGAACGGCTATGCCGATAGCTGACAAGTTCTTACGTCTAGGGTCTTAGCACCGCCATTCCCAGGTAAAG
5 TTGAAACGTGTAATTTTACTTATCGATTATAAGCATCATACATGATATGGTTGTTTGCTGATGACTTCTTGGCCATAAAAGGCATTGTAGTATGCATGTGCCGTCAGACCCGCCTATAACAGAACTATGTAAATAGTGGGGGCCAGTACAACTCCCTACCGATTGAATCTTATAATGGTGAATGATGTTAACGCGCTCTTGAATTGTCTTTTAAGCATAAGGGCTTTAGATAGACTAATCTAGCTTTAATTCACTCTAGTAGAAGCTTACGCTTTAATCAACCGTTAACACATTGTCGATCCATTTCAACACACTCTGTTCAATACCTTGGATCTATAAGATCCATGGGTTTACAACATTTGTTGTTGCTGCTCGTATACCCTGGCGGATGGCGTTAGATCTCCAGAGTAACCTGCTAGCTACACATATCGTGAATGGCTATGTCGATAACGGACAAGTTCCTACGTCTAGGATCTTAGTACCGGCATTCCCAAGTGAAG
1 TTGAAACGGGTAATCTTACTTATCGATTATAAGCATCATACATGATATGGTTGTTTGCTGATGATTTCTTAGCTATAAAAGGCATTATAGTGTGCATGTGCCGTCAGACCCGCCTATTATAGAACTATGTAAATAGTGGGGGCCAGTACAACTCCCTACCGATTGAGTCTTATAATGGTGAATGATGTTAACGCGCTATTGAATTGTCTTTTAAGCATGAGGGCTTTAGATAGACTAATCTAGCTTTAATTCACTCTAGTAGAAGCTTACGTTTTAATCAACCGTTAACACATTGTCGATCCATTTCAACACACACTGTTCAATACCTTGGATCTATAAAATCCATGGGTACACAACATGTGTTGTTTCTGCTTGTCTACCCTGGTGAATGGCGTTAGGTTTCCAGAGTAATTTGCTAGCTACACGTATCGTGGACGGCTATGTCGATAGCGGACAAGTTCTTACGTCTAGAATCGTAGTACCGCCATTCCCAGGTGAAG
2 TTGAAACGGGTAATCTTACTTATCGATTATAAGCATCATACCTGATATGGTTGTTTGCTGATGGTTTCTTAGCTATAAAAGGCATTATAGTGTGCATGTGCCGTCAGACCCGCCTATTATAGAACTATGTAAATAGTGGGGGCCAGTACAACTCCCTACCGATTGAGTCTTATAATGGTGAATGATGTTAACGCGCTATTGAATTGTCTTTTAAGCATGAGGGCTTTAGATAGACTAATCTAGCTTTAATTCACTCTAGTAGAAGCTTACGTTTTAATCAACCGTTAACACATTGTCGATCCATTTCCACACACACTGTTCAATACCTTGGATCTATAAAATCCATGGGTACACAACATGTGTTGTTTCTGCTTGTCTACCCTGGTGAGTGGCGTTAGGTTTCCAGAGTAATCTGCTAGCTACACGTATCGTGGACGGCTATGTCGATAGCGGACAAGTTCTTACGTCTAGAATCGTAGTACCGCCATTCCCAGGTGAAG
3 TTGAAACGGGTAATCATACTTATCGATTATAAGCATCATACATGATACGGTTGTTTGCTGATGATTTCTTAGCTATAAAAGGCATTATAGTGTGCATGTGCCGTCAGACCCGCCTATTATAGAACTATGTAAATAGTGGGGGCCAGTACAACTCCCTACCGATTGAATCTTATAATGGTGATTGATGTTAACGCTCTATTGAATTGTCTTTCAATCATAAGGGCTTTAGATAGACTAATCTAGCTTTAATTCACTCTAGTAGAAGCTTACGTTTTAATCAATCGTTAACACATTGTCGATCAATTTCAACACACACTGTTCAATACCTTGGATCTATAAAATCCTTGGGTACACAACATTTGTTGTTTTTGCTTGTATACCCTGGTGAATGGCGTTAGGTTTCCAGAGTAATCTGCTAGCTACACGTATCGTGAACGGCTATGTCGATAGCGGACAAGTTCTTACGTCTAGAATCGTAGTACCACCGTTCCCAGGTGAAG
4 TTCAAACGGGTAATCATACTTATCGATTATAAGCATCATACATGATATGGTTGTTTGCTGATGATTTCTTAGCTATCAAAGGCATTATAGTGTGCATGTGCCGTCAGACCCGCCTATTATAGAACTATGTAGATAATGGGGGCCAGTACAACTCCCTACCGATTGAATCTTATAATGGTGAATGATGTTAACGCTCTATTGAATTGTCTTTCAAGCATAAGGGCTTTAGATAGACTAATCTAGCTTTAATTCACTCTAGTAGAAGCTTACGTTTTAATCAATCGTTAACACATTGTCGATCAATTTCAACACACACTGTTCAATACCTTGGATCTATAAAATCCATGGGTACACAACATTTGTTGTTTCTGCTTGTATACCCTGGTGAATGGCGTTAGGTTTCCAGAGTAATCTGCCAGCTACACGTATCGTGAACGGCTATGTCGATAGCGGACAAGTTCTTACGTCTAGAATCGTAGTACCGCCATTCCCAGGTGAAG
;
END;
now go back to the main folder for the 30-gene data, because later analyses will start from there:
$ cd ..
$ pwd
/home/moleuser/phylo-networks/data_results/baseline.gamma0.3_n30
Next: gene trees with MrBayes
PhyloNetworks Workshop
- home
- example data
-
TICR pipeline:
from sequences to quartet CFs
- the data
- MrBayes on all genes
- BUCKy
- Quartet MaxCut
- RAxML & ASTRAL
- PhyloNetworks: from quartet CFs or gene trees to phylogenetic networks
- TICR test: is a population tree with ILS sufficient (vs network)?
- Continuous trait evolution on a network