Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping to repeats leads to deletions with low allele frequency #76

Open
flashton2003 opened this issue Feb 4, 2020 · 6 comments
Open

Comments

@flashton2003
Copy link

Hello,

I'm analysing some Cryptococcus neoformans (a haploid fungus) PacBio genome data. I noticed something strange when I was looking at some deletions which had low allele frequency. When only part of a repeated region was deleted, sometimes NGMLR was not consistent with how it split the read. Here is a clear example.

Screenshot 2020-02-04 at 15 42 02

There is a TTCTTCCCCC motif repeated four times in the reference genome. Most of the reads which map there only support there being one TTCTT part of the motif left (probably CCCCCTTCTTCCCCC), but the reads are mapped to different 'ends' of the 4-fold repeat in the reference genome. This means that the allele frequency is not as high as it should be, because each end of the deletion is only supported by around half the reads.

When I looked at the variants sniffles called, quite a lot of my deletions with low allele frequencies were in repeat regions.

I just wondered if there was a way to place these reads in repeat regions more consistently, as this would lead to more variants passing an allele frequency threshold of 80%.

Best,

Phil Ashton

@fritzsedlazeck
Copy link
Collaborator

Dear Phil,
thanks for reaching out. Yes this is a problem. Most of the time one requires some randomness in the alignment backtracking procedure to not accumulate artifacts. However, in these regions, this is less favorable.

Can you tell me if you tried to use the newer version of Sniffles and still get low frequency in such a region? I tried to improve this recently.
Thanks
Fritz

@flashton2003
Copy link
Author

Hi Fritz,

I thought you might have come across this issue, it seems quite common in my data. Perhaps these repeat regions are susceptible to indels?

I'm using v1.0.11, which I think is the most up to date version?

Best,

Phil

@fritzsedlazeck
Copy link
Collaborator

Hi Phil,
Its a common problem I am investigating STR regions especially.
Go to the github from Sniffles and try v1.14 that improved a lot in GT and estimating the frequency.
Cheers
Fritz

@fritzsedlazeck
Copy link
Collaborator

Oh my bad 1.11 is the newest. Sorry beeing jetlaged in Brussel at the moment...

@flashton2003
Copy link
Author

Ah, no worries.

Any thoughts on alternative filtering criteria, other than AF, which might help us include some of these ones?

@fritzsedlazeck
Copy link
Collaborator

I will need to think about it. I am up since yesterday..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants