-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rowRanges of SingleCellExperiment output don't give 3'UTR coordinates #94
Comments
Thanks for the interest! The GRanges metadata provides the window where the counting took place, which is typically 500 nts (summing exon subranges). The 3'UTR length is an inference based on the difference of the cleavage site position (3' position of the GRanges interval) and the CDS STOP codon position of the associated Ensembl transcript that was closest to the cleavage site. That is, we only count in the peak window that is adjacent to the cleavage site, but provide the 3'UTR length based on parsimony assumptions. Hope that clarifies the metadata, but feel free to request any additional information. |
Thanks, I think that makes sense! |
For the vast majority, yes. However, a rare edge case is if the annotated 3'UTR itself has an intron. In such a scenario, the procedure you outlined would identify the 3'UTR as starting downstream of where it actually begins. To cover that edge case, I'm not sure there's any way around looking up the reference annotation, i.e., import the Ensembl/GENCODE annotation, derive 3'UTR coordinates from that, and then augment/truncate them according to the 3'-most position in GRanges. I see that having this precomputed for each cleavage site would be a valuable addition. However, I'm not sure it's a proper feature request for scUTRquant itself. For example, if Otherwise, I could just precompute this for the UTRomes and deposit tables somewhere (e.g., FigShare). |
Precomputing the 3'UTR coordinates for the transcripts in the UTRomes and depositing them somewhere sounds good! That would be much appreciated. |
The rowRanges of the SingleCellExperiment output by scUTRquant seems to not capture the 3'UTRs of the transcripts. For example, the 3'UTR length of the ENST00000621592.8 transcript is 1993 according to the
utr_length
column. But, the coordinates given by the rowRanges are chr8:127742452-127742951, which is length 500. Looking at the 3'UTR length of this transcript on the genome browser, it does seem to be 1993.What coordinates is rowRanges providing? It seems to provide multiple coordinates for the same transcript. How can one extract the actual 3'UTR coordinates, the same ones used to compute the SingleCellExperiment output?
The text was updated successfully, but these errors were encountered: