-
Notifications
You must be signed in to change notification settings - Fork 19
Circle Map Realign output files
In this sub-wiki we will explain every column of the output file generated by Circle-Map. All in all, the output file consists of a 11 column, tab separated file. In the output file, every line represents a detected circular DNA. The line in the file will provide information about the mapping coordinates, read support and a set of coverage statistics.
1. Chromosome
This column indicates the name of the reference sequence. In other words, this is the chromosome/contig where the circular DNA originated from.
2. Start coordinate
This column indicates the 0-based starting position of the circular DNA.
3. End coordinate
This column indicates the 0-based end position of the circular DNA.
4. Discordants
This column indicates the number of discordant read pairs supporting the detected circular DNA.
Note
If there are the strong disagreements between the number of discordant reads and split reads the circular DNA should be handled with care. As an example, if a circular DNA contains tens or hundreds of discordant read pairs supporting it and only 1-5 split reads we suggest the circle is interpreted with care.
5. Split reads
This column indicates the number of split reads supporting the detected circular DNA.
Note
1- If there are the strong disagreements between the number of split reads and discordant reads the circular DNA should be handled with care. As an example, if a circular DNA contains tens or hundreds of split reads supporting it and only 1-5 discordant read pairs we suggest the circle is interpreted with care.
2- If you want to filter the output by read evidence, this is a good column to apply filters to as it provides a direct evidence of the amount of reads that cross the circular DNA breakpoint. In our experiments, we got reliable results applying a filter of at least 2 split reads. However, this is very dependent on the research question you want to answer. As a rule of thumb, we recommend setting a filter of 5-10 split reads if you want to minimize the number of false positives.
6. Circle score
This column indicates the score of the circle. It is an additive schoring scheme that takes into account the alignment quality, length of the split segment and the number of split reads supporting the circular DNA.
Note
As a rule of thumb, we suggest that circular DNA with a score between 10-50 are interpreted with care. Furthermore we consider scores between 50-200 to be decent scores for a circle, while scores above 200 are indicative of high quality circular DNA.
7. Mean coverage
This column indicates the mean base coverage within the circular DNA detection coordinates.
Note
Circular DNA with a low mean base coverage should be interpreted with care.
8. Standard deviation This column indicates the standard deviation of the base coverage vector.
Note
If the circular DNA contains a high standard deviation (e.g The standard deviation is larger than the mean), it is indicative of strong variations in the sequencing coverage within the region. This could be caused by biologically interesting scenarios (structural variation within the circle) or artifacts causing misdetection of the circle. No matter the reason, if you are planning to obtain strong scientific conclusions about the circular DNA we recommend that you investigate the reason for a the high standard deviation.
9. Coverage increase in the start coordinate
This column indicates the coverage ratio in the start of the circular DNA.
Note
Values between 0-0.33 indicate that there is a decrease in coverage. This is very strange and should be handled with care. Values between 0.33-1 indicate that there is an increase in read coverage, which is normal and an indication of high quality circles.
10. Coverage increase in the end coordinate
This column indicates the coverage ratio in the end of the circular DNA.
Note
Values between 0-0.33 indicate that there is a decrease in coverage. This is very strange and should be handled with care. Values between 0.33-1 indicate that there is an increase in read coverage, which is normal and an indication of high quality circles.
11. Coverage continuity
This column indicates the fraction of bases not covered by reads within the circular DNA detection coordinates.
Note
Values close to 0 indicate that the whole circle was covered by reads and suggest high quality identifications. Values close to 1 indicate that very few reads aligned within the circular DNA and this should be handled with care.