Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liftoff doesn't find all the features in the target from the reference in one go #167

Open
LiaOb21 opened this issue May 14, 2024 · 0 comments

Comments

@LiaOb21
Copy link

LiaOb21 commented May 14, 2024

Dear developers,

Thank you for developing such a useful tool. I am currently using Liftoff to identify features in a multifasta file. This file contains selected sequences from several assemblies of the same species, and I am interested in determining whether these sequences include genes already annotated in the species' reference genome.

Here's an overview of my process:

  1. I initially ran Liftoff and used the gene coordinates obtained for my target to mask the initial multifasta.
  2. I then ran Liftoff on this masked multifasta. Interestingly, the same gene was identified again but in a different sequence.
  3. I repeated this process up to four times, each time finding the same gene in different sequences.

I tried several configurations, including the use of more stringent parameters for the last series of Liftoff runs (including -copies). Here an example of the parameters and procedure I used:

liftoff -o liftoff_output.gff3 -p 50 -copies -exclude_partial -a 0.95 -s 0.95 -g reference.gff multifasta.fasta reference.fasta

# extracting genes from liftoff_output.gff3 > liftoff_output_only_genes.gff3
# masking multifasta.fasta based on  liftoff_output_only_genes.gff3 > multifasta_masked.fasta

liftoff -o liftoff_output_2.gff3 -p 50 -copies -exclude_partial -a 0.95 -s 0.95 -g reference.gff multifasta_masked.fasta reference.fasta

# extracting genes from liftoff_output_2.gff3 > liftoff_output_2_only_genes.gff3
# masking multifasta_masked.fasta based on  liftoff_output_2_only_genes.gff3 > multifasta_masked_2.fasta

# repeat liftoff on multifasta_masked_2.fasta with the same parameters

... and so on for up to four iterations.

The masking was checked and is correct. I am unsure whether the issue lies with Liftoff or Minimap2. I also adjusted the parameters for Minimap2 to -N 5000 by modifying the default parameters directly in the source code because the -mm2_options command did not work as expected, but it didn't resolve the issue.

Could you please provide any suggestions on how to adjust the parameters or offer insights into what might be causing these issue?

Thank you so much in advance! 😊

Lia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant