- 了解全基因组序列比对与一般意义上的多序列比对异同
- 掌握mummer, mauve的用法
Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction, and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make the most effective use of our rapidly growing databases of whole genomes.
/data/lab/genomic/lab04/data
$ mkdir data results
$ cd data
$ ln -s /data/lab/genomic/lab04/data/*.fasta ./
$ cd ../results
work_nucmer.sh
#!/bin/bash
#$ -S /bin/bash
#$ -N nucmer
#$ -j y
#$ -cwd
nucmer -p X23_B011 ../data/X23.fasta ../data/B011.fasta
dnadiff -p X23_B011 -d X23_B011.delta
结果文件:X23_B011.delta
dnadiff 结果
OUTPUT:
.report - Summary of alignments, differences and SNPs
.delta - Standard nucmer alignment output
.1delta - 1-to-1 alignment from delta-filter -1
.mdelta - M-to-M alignment from delta-filter -m
.1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
.mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
.snps - SNPs from show-snps -rlTHC .1delta
.rdiff - Classified ref breakpoints from show-diff -rH .mdelta
.qdiff - Classified qry breakpoints from show-diff -qH .mdelta
.unref - Unaligned reference IDs and lengths (if applicable)
.unqry - Unaligned query IDs and lengths (if applicable)
查看结果
$ mummerplot --layout --medium X23_B011.delta
$ mummerplot --layout --medium --png -p X23_B011 X23_B011.delta
$ cat ../data/*.fasta > genome.fasta
work_mauve.sh
#!/bin/bash
#$ -S /bin/bash
#$ -N mauve
#$ -j y
#$ -cwd
mauveAligner --output=my_seqs.xmfa genome.fasta
查看结果
$ Mauve my_seqs.xmfa
- 真核生物基因组全基因组比对,参考这里
- 如何基于全基因组序列比对构建物种进化树