-
Notifications
You must be signed in to change notification settings - Fork 19
Tutorial: Identification of repetitive circular DNA using Circle Map Repeats
This is a step-by-step tutorial explaining how to go from the raw sequencing data (fastq files) to a interpretable file containing the chromosomal coordinates of the repetitive DNA circles present in your sample. In order to create friendly tutorial we have simulated reads from a tandem duplication present in the Saccharomyces cerevisiae genome. The aim of this tutorial is to use Circle-Map Repeats to detect the origin of theDNA circle formed from the repetitive region of the yeast genome.
- GNU/Linux
- The Burrows Wheeler aligner (BWA)
- SAMtools
- Circle-Map
We can download the left and right Illumina reads using the following commands
wget https://raw.githubusercontent.com/iprada/Circle-Map/master/tutorial/repetitive_region1.fastq
wget https://raw.githubusercontent.com/iprada/Circle-Map/master/tutorial/repetitive_region2.fastq
Now that we have downloaded the raw data, we need to download and prepare the yeast reference genome in order to align the circular DNA reads back to it. In this tutorial, we will use the SacCer3 version of the yeast genome, which is the most recent one. We can download the yeast chromosomes from UCSC:
wget https://hgdownload.soe.ucsc.edu/goldenPath/sacCer3/bigZips/chromFa.tar.gz
Decompress it with the tar command:
tar xvf chromFa.tar.gz
And join the resulting chromosomes into a single file using cat:
cat chr*.fa > SacCer3.fa
Finally, in order to align the reads to the reference genome with bwa we need to create the index of the reference genome, which are a set of files required by the aligner to efficiently align the reads. We can do this using:
bwa index SacCer3.fa
Now that we have prepared all the required files, we are ready to execute our workflow for detecting repetitive circular DNA. In this case, we will use the MEM algorithm under BWA:
bwa mem SacCer3.fa repetitive_region1.fastq repetitive_region2.fastq > aligned_repetitive_circle.sam
This command generates a SAM file containing the information about where and how the reads align to the yeast reference genome.
Notes:
- If your input files are large, you can speed-up BWA increasing the number of cores using the -t option.
In this case, we need to sort the reads by alignment position and index the resulting BAM file so that we can execute Circle-Map in the next step. We can do so using SAMtools:
samtools sort -o sorted_aligned_repetitive_circle.bam aligned_repetitive_circle.sam
Notes:
- If your input files are large, you can speed-up SAMtools increasing the number of cores using the -@ option.
Finally, we need to index the sorted file:
samtools index sorted_aligned_repetitive_circle.bam
Finally, we can detect the circles using Circle-Map. Behind the scenes, Circle-Map Repeats is looking at reads with two high scoring alignments as these ones are indicative of circles formed from regions with homology.
Circle-Map Repeats takes as input two arguments. A BAM file sorted by alignment position (indicated with the -i argument) and the output filename (indicated with the -o argument). We can execute the algorithm as follows:
Circle-Map Repeats -i sorted_aligned_repetitive_circle.bam -o my_unknown_circle.bed
Now, we can look at the output of Circle-Map using:
less -S my_unknown_circle.bed
Where we can see the BED file generated by Circle-Map Repeats:
chrXII 451417 468889 3018 . 19.510931776556774 5.131087870100604 1.0 0.9081364829396326 0.002461080586080586
In this case, we have detected a DNA circle formed from the ribosomal DNA array, which is by far the most studied circle in the yeast genome.