Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENSG id and gene name not matching #240

Open
seyoun209 opened this issue Jul 31, 2023 · 3 comments
Open

ENSG id and gene name not matching #240

seyoun209 opened this issue Jul 31, 2023 · 3 comments

Comments

@seyoun209
Copy link

seyoun209 commented Jul 31, 2023

Dear Leafcutter team,

I have realized that in the differential leafcutter, data xxx.RData output file, the same gene name appears to be correct along with the coordinates, but the ENSGID does not match. For example:
91218 clu_27247_+ LIMCH1 ENSG00000064042.18 chr4 41551191 41551321 annotated 0.001
91219 clu_27247_+ LIMCH1 ENSG00000064042.18 chr4 41551395 41598920 annotated -0.010
111731 clu_33468_+ PTPN12 ENSG00000064042.18 chr7 77571186 77581427 annotated -0.005
111732 clu_33468_+ PTPN12 ENSG00000064042.18 chr7 77571186 77585543 novel annotated pair 0.004
111733 clu_33468_+ PTPN12 ENSG00000064042.18 chr7 77581503 77583555 annotated 0.002
77657 clu_23215_+ CD40 ENSG00000064042.18 chr20 46123219 46128138 annotated 0.022
77658 clu_23215_+ CD40 ENSG00000064042.18 chr20 46126701 46128138 annotated -0.022
77659 clu_23215_+ CD40 ENSG00000064042.18 chr20 46126741 46128138 cryptic_fiveprime -0.001
77660 clu_23215_+ CD40 ENSG00000064042.18 chr20 46127289 46128138 annotated 0.001
68841 clu_20602_- METTL8 ENSG00000064042.18 chr2 171325906 171326042 annotated 0.024
68842 clu_20602_- METTL8 ENSG00000064042.18 chr2 171325906 171330559 annotated -0.039
8791 clu_3411_+ MGST3 ENSG00000064042.18 chr1 165635822 165649841 annotated -0.002
119649 clu_35805_+ PTGS1 ENSG00000064042.18 chr9 122371272 122377899 annotated -0.044
119650 clu_35805_+ PTGS1 ENSG00000064042.18 chr9 122371834 122377899 annotated 0.049

I also have checked in reference file for leafcutter bed file: CD40 ENSG is ENSG00000101017.14 , METTL8 is ENSG00000123600.19, MGST3 is ENSG00000143198.13 and also PTGS1 is ENSG00000095303.17. So I don't think reference file has a problem but not sure.
Have you faced a similar situation, and if so, do you have any suggestions on how to address it? Thank you in advance for your help.

@jackhump
Copy link
Collaborator

Hi there,

Try regenerating the annotations from the same GTF you used in alignment using gtf2leafcutter.pl

@viljabio
Copy link

Hi,

I am facing this exact same issue. The ENSG ids are incorrect and one ENSG id is often given for multiple genes in different chromosomes. I also tried regenerating the annotations from the same GTF used for the alignment but it did not fix the issue. I have also checked that in the bed reference files the ENSG ids, gene names and genomic locations are correct.

Were you able to solve this issue @seyoun209 or do you have other ideas @jackhump what could be the root cause for this issue?

Thank you in advance for your help!

@seyoun209
Copy link
Author

seyoun209 commented Aug 27, 2024

I tried to regenerate using the gtf2leafcutter.but regeneration alone didn't fix the problem (I used to use the gencode.v34). So, I downloaded the gencode.v45 and regenerated it with the new reference genome, which fixed the problem. I'm not sure the version matters, but at least my case helped!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants