Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about interpretation of pred score #9

Open
m-jahn opened this issue Nov 10, 2023 · 1 comment
Open

question about interpretation of pred score #9

m-jahn opened this issue Nov 10, 2023 · 1 comment

Comments

@m-jahn
Copy link

m-jahn commented Nov 10, 2023

Hi DeepRibo developers,

thanks for making this package, it is very convenient to work with.
I'm currently testing the predictions from DeepRibo for a bacterium where we have new RiboSeq data.
I am using mainly standard settings with the pretrained model, and I make use of the possibility to pass annotation data for S-curve estimation. From what I see, the prediction works quite well, but I have some questions about the score (and its distribution) that deepRibo produces.

My filtered output looks roughly like this:

   seqnames start   end width strand intergenic   rpk rpk_elo  pred SS_pred_rank
   <chr>    <dbl> <dbl> <dbl> <chr>  <lgl>      <dbl>   <dbl> <dbl>        <dbl>
 1 NC_0012…  1742  2878  1136 +      FALSE       741.    6.72  3.71           73
 2 NC_0012…  1788  1844    56 +      FALSE      1529.   18.4  -6.84         2933
 3 NC_0012…  2025  2060    35 +      FALSE       337.    3.21 -6.78         2885
 4 NC_0012…  2178  2228    50 +      FALSE       570.    3.94 -9.48         4448
   ...

I have already removed the inferior ORFs per stop codon. The remaining list of probable ORFs predicted by deepRibo exceeds by far the number of annotated genes (many false positives). Using a set of known, real ORFs for benchmarking, I found that almost all correctly predicted ORFs have a positive pred score (50 out of 70), while thousands of "false positives" have a negative score. Yet the pred score is not used to label high confidence ORFs, only to rank ORFs.

My question is therefore, if this behavior is expected, and if the pred score can be used as a threshold to identify high confidence ORFs?

@m-jahn
Copy link
Author

m-jahn commented Feb 1, 2024

Any info or update on this? I'm still interested.
We are currently making a comparison of different small ORF scores and to use your package, I'd need to know how to interpret the scores.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant