Justification for feature overlap filters #143

watsonar · 2022-09-09T16:21:28Z

watsonar
Sep 9, 2022

Hello,

First of all, thank you so much for making this tool which I have been very impressed by! I was wondering - how were the feature overlap filters decided? I was looking at the table in this issue (#22 (comment)) and wondered if there was any literature or reasoning that you could provide for why you remove some features over others, or why you'd never expect certain features to overlap.

Thank you so much again,
Andrea

Answered by oschwengers

Sep 15, 2022

Hi Andrea (@watsonar) , thanks for reaching out and asking this excellent question!
To be honest, there are no distinct publications that the current implementation relies on, at least not yet. Instead, I thought that we just need such a step in the workflow and hence implemented a rather simple version and placeholder in the code for future refinements/improvements that of course should be based on large-scale data/results and be less influenced by rather anecdotal observations.

Essentially, the current implementation is based on best practices and a piece of common sense within the community (see Dfast, Prokka, PGAP), as for example we regularly see that there are false-positive ORFs cr…

View full answer

oschwengers · 2022-09-15T11:30:49Z

oschwengers
Sep 15, 2022
Maintainer

Hi Andrea (@watsonar) , thanks for reaching out and asking this excellent question!
To be honest, there are no distinct publications that the current implementation relies on, at least not yet. Instead, I thought that we just need such a step in the workflow and hence implemented a rather simple version and placeholder in the code for future refinements/improvements that of course should be based on large-scale data/results and be less influenced by rather anecdotal observations.

Essentially, the current implementation is based on best practices and a piece of common sense within the community (see Dfast, Prokka, PGAP), as for example we regularly see that there are false-positive ORFs crossing tRNAs or in turn false-positive tRNA overlapping tmRNAs. These cases are filtered out. In terms of CDS and especially short CDS (sORFs), it's obviously much more complicated. Here, I'd like to allow some reasonable overlaps, e.g. <X bp if not encoded in the same frame. But as described above, I'm keen to get some publication-backed thresholds for X before implementing any more-sophisticated overlap features. In particular for the sORFs I implemented rather strict overlap features to reduce the potentially large number of false-positives.

So, if you (or anyone else) knows of some good publications that could provide solid statistical information that are useful to refine these overlaps, I'd love to hear about them.

Thanks again and best regards!
Oliver

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Justification for feature overlap filters #143

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Justification for feature overlap filters #143

watsonar Sep 9, 2022

Replies: 1 comment

oschwengers Sep 15, 2022 Maintainer

watsonar
Sep 9, 2022

oschwengers
Sep 15, 2022
Maintainer