Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDS entries in GFF3 file are not merged to a CompoundLocation #95

Open
mikdur opened this issue Jan 14, 2015 · 5 comments
Open

CDS entries in GFF3 file are not merged to a CompoundLocation #95

mikdur opened this issue Jan 14, 2015 · 5 comments

Comments

@mikdur
Copy link

mikdur commented Jan 14, 2015

Given a gene that looks something like this in GFF3 notion:

##gff-version 3
scf_001 maker   gene    36837   38790   .       +       .       ID=BN869_G00000007;Name=BN869_G00000007;
scf_001 maker   mRNA    36837   38790   .       +       .       ID=BN869_T00000007_1;Parent=BN869_G00000007;Name=BN869_T00000007_1;
scf_001 maker   exon    36837   37491   .       +       .       ID=BN869_T00000007_1:exon:0;Parent=BN869_T00000007_1;
scf_001 maker   exon    37547   38790   .       +       .       ID=BN869_T00000007_1:exon:1;Parent=BN869_T00000007_1;
scf_001 maker   CDS     36837   37491   .       +       0       ID=BN869_T00000007_1:cds;Parent=BN869_T00000007_1;
scf_001 maker   CDS     37547   38790   .       +       2       ID=BN869_T00000007_1:cds;Parent=BN869_T00000007_1;

The GFF parser fails to join the two CDSs with the same ID into a single feature with a CompoundLocation. The result of this is that GenBank och EMBL files produced when merging (and flattening) GFF3 annotations get multiple CDSs where the CDS position instead should be a join, eg:

FT   CDS             join(36837..37491,37547..38790)
@bgruening
Copy link

@chapmanb is there an easy solution for this? I stumbled over this as well, as I tried to integrate protein sequences to CDS records.

@chapmanb
Copy link
Owner

Björn and Mikael;
Sorry about leaving this for so long. I've been meaning to tackle it forever. Have you tried using GFFutils:

https://github.com/daler/gffutils

I've been pointing everyone at Ryan's work as it's better and more up to date than this library. The goal has been to merge any missing functionality this library has there. Hopefully it'll handle your case better.

@bgruening
Copy link

@chapmanb yes I'm developing currently some Galaxy integration for gffutils, but this is lacking the conversion features as far as I know. You can not convert a gff-sqlite to genbank, isn't it?

@mikdur
Copy link
Author

mikdur commented Jun 18, 2015

I think I have some code that does the merge, albeit maybe not in an optimal way. I'll check it and see if it fits to be merged into a suitable place.

Cheers,
Mikael


Sent from a crippled computer (a.k.a a phone)

18 jun 2015 kl. 17:40 skrev Brad Chapman <notifications@github.commailto:notifications@github.com>:

Björn and Mikael;
Sorry about leaving this for so long. I've been meaning to tackle it forever. Have you tried using GFFutils:

https://github.com/daler/gffutils

I've been pointing everyone at Ryan's work as it's better and more up to date than this library. The goal has been to merge any missing functionality this library has there. Hopefully it'll handle your case better.

Reply to this email directly or view it on GitHubhttps://github.com//issues/95#issuecomment-113196433.

@bgruening
Copy link

@mikdur this would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants