Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing genes not in unmapped_features.txt #157

Open
14zac2 opened this issue Nov 3, 2023 · 2 comments
Open

Missing genes not in unmapped_features.txt #157

14zac2 opened this issue Nov 3, 2023 · 2 comments

Comments

@14zac2
Copy link

14zac2 commented Nov 3, 2023

Hi there,

I really enjoy your tool! It's been very helpful for me, as I've been annotating the genome of a less common model organism. I am annotating a new version of the genome that was just sequenced, and am lifting over the annotation from an older (but very similar) version of the genome. However, I noticed that all of my mitochondrial genes did not lift over (and are not in the unmapped_features.txt file) even though the mitochondrial genome sequence is present in both FASTA files. Here is the command that I used:

nohup docker run -v "$(pwd)":/tmp staphb/liftoff liftoff \
 -g /tmp/sc2_ortho_mito.gtf /tmp/DovetailWoodchuckGenome/hirise/alex-uni3883-mb-hirise-6rdjs__04-11-2023__final_assembly.fasta /tmp/WCK01_AAH20201022_F8-SC2.fasta -o /tmp/DovetailWoodchuckGenome/comparison_to_scf/liftoff_all/all_on_dovetail.gff -u /tmp/DovetailWoodchuckGenome/comparison_to_scf/liftoff_all/unmapped_features.txt -p 20 -infer_genes -dir /tmp/DovetailWoodchuckGenome/comparison_to_scf/liftoff_all/intermediate_files >& liftoff_all.nohup.out

Do you have any thoughts as to where these genes may have ended up? I can manually add them to the GFF3 file, but I'm perplexed as to why they are not there when the sequence is an exact match. Please let me know if you require further information about my setup.

Many thanks for your help,
Zoe

@CowanCS1
Copy link

Hi Zoe,

I had a similar observation, so wanted to share my experience.

Very likely these genes were mapped to nuclear mitochondrial sequences. If you grep "MT-" all_on_dovetail.gff I'd expect you will find them listed in other chromosomes. That's why they weren't considered unmapped features. Surprisingly, some of these nuclear mitochondrial sequences can have a really high identity, even compared to those on the mitochondrial genome.

There are a few ways to handle this. I split the mitochondrial DNA and mapped it separately, since I needed to include some flanking sequences to get good mapping of the shorter genes. I also remember seeing a setting in liftoff where you can include a file that restricts the mapping of specific chromosomes between the genomes, prior to mapping all remaining chromosomes.

Hope that helps!

@14zac2
Copy link
Author

14zac2 commented Dec 4, 2023

Hi @CowanCS1 - thank you for your input! I tried grepping MT- on the GFF file, however no matches came up. It's like the genes disappeared entirely! I ended up manually adding the mitochondrial genes to all_on_dovetail.gff and everything seems fine now, but I never did figure out why those genes didn't go anywhere. Interestingly, there was a single mitochondrial gene that mapped over (a weird non-coding gene that was automatically detected by an annotation software; the well-known coding mitochondrial genes that didn't map were concatenated to the original GFF file, as well). My quick fix is working fine, but I remain perplexed.

I'm glad you figured out your problem, and thanks again for sharing your experience!

Cheers,
Zoe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants