You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently reannotating many P. aeruginosa genomes, and I want to use the PAO1 annotations from the pseudomonas genome database, with a couple of other proteins, as a reference for the first round of annotation. However, when PAO1 itself is annotated not all the expected genes are there, and I am struggling to work out why.
In my reference file, Pa_PAO1_107_annotations.gbk, on gene has the following entry: gene complement(2694546..2694764) /gene="PA2412" /locus_tag="PA2412" /db_xref="Pseudomonas Genome DB: PGD107602" CDS complement(2694546..2694764) /gene="PA2412" /locus_tag="PA2412" /product="conserved hypothetical protein" /codon_start=1 /translation_table=11 /translation="MTSVFDRDDIQFQVVVNHEEQYSIWPEYKEIPQGWRAAGKSGLKK DCLAYIEEVWTDMRPLSLRQHMDKAAG" /protein_id="NP_251102.1"
After converting to a fasta file with prokka-genbank_to_fasta_db, we have the following entry: >NP_251102.1 ~~~PA2412~~~conserved hypothetical protein MTSVFDRDDIQFQVVVNHEEQYSIWPEYKEIPQGWRAAGKSGLKKDCLAYIEEVWTDMRP LSLRQHMDKAAG
I then run Prokka with: prokka --outdir ./Pa_PAO1_107/ --prefix Pa_PAO1_107 --proteins ../../raw_data/genomes/siderophore_annotations.db --force --locustag Pa_PAO1_107 --cpus 8 ../oriented_genomes/Pa_PAO1_107/Pa_PAO1_107_reoriented.fasta
In the output file, Pa_PAO1_107.gbk I have no matches for PA2412, however I do have the following entry CDS complement(2694064..2694282) /locus_tag="Pa_PAO1_107_02485" /inference="ab initio prediction:Prodigal:002006" /inference="similar to AA sequence:siderophore_annotations.db:NP_251102.1" /note="conserved hypothetical protein" /codon_start=1 /transl_table=11 /product="hypothetical protein" /translation="MTSVFDRDDIQFQVVVNHEEQYSIWPEYKEIPQGWRAAGKSGLK KDCLAYIEEVWTDMRPLSLRQHMDKAAG"
You can see that the two amino acid sequences are identical: MTSVFDRDDIQFQVVVNHEEQYSIWPEYKEIPQGWRAAGKSGLKKDCLAYIEEVWTDMRPLSLRQHMDKAAG MTSVFDRDDIQFQVVVNHEEQYSIWPEYKEIPQGWRAAGKSGLKKDCLAYIEEVWTDMRPLSLRQHMDKAAG
I am unsure why, with identical amino acid sequences, this has not been annotated with /gene="PA2412". Clearly it has matched to some degree, as the inference is /inference="similar to AA sequence:siderophore_annotations.db:NP_251102.1".
For another protein it has worked as expected:
Reference entry: gene complement(2693781..2694545) /gene="PA2411" /locus_tag="PA2411" /db_xref="Pseudomonas Genome DB: PGD107600" CDS complement(2693781..2694545) /gene="PA2411" /locus_tag="PA2411" /product="probable thioesterase" /codon_start=1 /translation_table=11 /translation="MGGTPVRLFCLPYSGASAMTYSRWRRKLPAWLAVRPVELPGRGAR MAEPLQTDLASLAQQLARELHDEVRQGPYAMLGHSLGALLACEVLYALRELGCPTPLGF FACGTAAPSRRAEYDRGFAEPKSDAELIADLRDLQGTPEEVLGNRELMSLTLPILRADF LLCGSYRHQRRPPLACPIRTLGGREDKASEEQLLAWAEETRSGFELELFDGGHFFIHQR EAEVLAVVECQVEAWRAGQGAAALAVESAAIC" /protein_id="NP_251101.1"
Fasta entry: >NP_251101.1 ~~~PA2411~~~probable thioesterase MGGTPVRLFCLPYSGASAMTYSRWRRKLPAWLAVRPVELPGRGARMAEPLQTDLASLAQQ LARELHDEVRQGPYAMLGHSLGALLACEVLYALRELGCPTPLGFFACGTAAPSRRAEYDR GFAEPKSDAELIADLRDLQGTPEEVLGNRELMSLTLPILRADFLLCGSYRHQRRPPLACP IRTLGGREDKASEEQLLAWAEETRSGFELELFDGGHFFIHQREAEVLAVVECQVEAWRAG QGAAALAVESAAIC
Output .gbk entry: CDS complement(2693299..2694063) /gene="PA2411" /locus_tag="Pa_PAO1_107_02484" /inference="ab initio prediction:Prodigal:002006" /inference="similar to AA sequence:siderophore_annotations.db:NP_251101.1" /codon_start=1 /transl_table=11 /product="putative thioesterase" /translation="MGGTPVRLFCLPYSGASAMTYSRWRRKLPAWLAVRPVELPGRGA RMAEPLQTDLASLAQQLARELHDEVRQGPYAMLGHSLGALLACEVLYALRELGCPTPL GFFACGTAAPSRRAEYDRGFAEPKSDAELIADLRDLQGTPEEVLGNRELMSLTLPILR ADFLLCGSYRHQRRPPLACPIRTLGGREDKASEEQLLAWAEETRSGFELELFDGGHFF IHQREAEVLAVVECQVEAWRAGQGAAALAVESAAIC"
Why is it that for the second entry there is a gene field, but for the first there is not?
Thanks
The text was updated successfully, but these errors were encountered:
Hello,
I am currently reannotating many P. aeruginosa genomes, and I want to use the PAO1 annotations from the pseudomonas genome database, with a couple of other proteins, as a reference for the first round of annotation. However, when PAO1 itself is annotated not all the expected genes are there, and I am struggling to work out why.
In my reference file, Pa_PAO1_107_annotations.gbk, on gene has the following entry:
gene complement(2694546..2694764)
/gene="PA2412"
/locus_tag="PA2412"
/db_xref="Pseudomonas Genome DB: PGD107602"
CDS complement(2694546..2694764)
/gene="PA2412"
/locus_tag="PA2412"
/product="conserved hypothetical protein"
/codon_start=1
/translation_table=11
/translation="MTSVFDRDDIQFQVVVNHEEQYSIWPEYKEIPQGWRAAGKSGLKK
DCLAYIEEVWTDMRPLSLRQHMDKAAG"
/protein_id="NP_251102.1"
After converting to a fasta file with
prokka-genbank_to_fasta_db
, we have the following entry:>NP_251102.1 ~~~PA2412~~~conserved hypothetical protein
MTSVFDRDDIQFQVVVNHEEQYSIWPEYKEIPQGWRAAGKSGLKKDCLAYIEEVWTDMRP
LSLRQHMDKAAG
I then run Prokka with:
prokka --outdir ./Pa_PAO1_107/ --prefix Pa_PAO1_107 --proteins ../../raw_data/genomes/siderophore_annotations.db --force --locustag Pa_PAO1_107 --cpus 8 ../oriented_genomes/Pa_PAO1_107/Pa_PAO1_107_reoriented.fasta
In the output file, Pa_PAO1_107.gbk I have no matches for PA2412, however I do have the following entry
CDS complement(2694064..2694282)
/locus_tag="Pa_PAO1_107_02485"
/inference="ab initio prediction:Prodigal:002006"
/inference="similar to AA
sequence:siderophore_annotations.db:NP_251102.1"
/note="conserved hypothetical protein"
/codon_start=1
/transl_table=11
/product="hypothetical protein"
/translation="MTSVFDRDDIQFQVVVNHEEQYSIWPEYKEIPQGWRAAGKSGLK
KDCLAYIEEVWTDMRPLSLRQHMDKAAG"
You can see that the two amino acid sequences are identical:
MTSVFDRDDIQFQVVVNHEEQYSIWPEYKEIPQGWRAAGKSGLKKDCLAYIEEVWTDMRPLSLRQHMDKAAG
MTSVFDRDDIQFQVVVNHEEQYSIWPEYKEIPQGWRAAGKSGLKKDCLAYIEEVWTDMRPLSLRQHMDKAAG
I am unsure why, with identical amino acid sequences, this has not been annotated with
/gene="PA2412"
. Clearly it has matched to some degree, as the inference is/inference="similar to AA sequence:siderophore_annotations.db:NP_251102.1"
.For another protein it has worked as expected:
Reference entry:
gene complement(2693781..2694545)
/gene="PA2411"
/locus_tag="PA2411"
/db_xref="Pseudomonas Genome DB: PGD107600"
CDS complement(2693781..2694545)
/gene="PA2411"
/locus_tag="PA2411"
/product="probable thioesterase"
/codon_start=1
/translation_table=11
/translation="MGGTPVRLFCLPYSGASAMTYSRWRRKLPAWLAVRPVELPGRGAR
MAEPLQTDLASLAQQLARELHDEVRQGPYAMLGHSLGALLACEVLYALRELGCPTPLGF
FACGTAAPSRRAEYDRGFAEPKSDAELIADLRDLQGTPEEVLGNRELMSLTLPILRADF
LLCGSYRHQRRPPLACPIRTLGGREDKASEEQLLAWAEETRSGFELELFDGGHFFIHQR
EAEVLAVVECQVEAWRAGQGAAALAVESAAIC"
/protein_id="NP_251101.1"
Fasta entry:
>NP_251101.1 ~~~PA2411~~~probable thioesterase MGGTPVRLFCLPYSGASAMTYSRWRRKLPAWLAVRPVELPGRGARMAEPLQTDLASLAQQ LARELHDEVRQGPYAMLGHSLGALLACEVLYALRELGCPTPLGFFACGTAAPSRRAEYDR GFAEPKSDAELIADLRDLQGTPEEVLGNRELMSLTLPILRADFLLCGSYRHQRRPPLACP IRTLGGREDKASEEQLLAWAEETRSGFELELFDGGHFFIHQREAEVLAVVECQVEAWRAG QGAAALAVESAAIC
Output .gbk entry:
CDS complement(2693299..2694063)
/gene="PA2411"
/locus_tag="Pa_PAO1_107_02484"
/inference="ab initio prediction:Prodigal:002006"
/inference="similar to AA
sequence:siderophore_annotations.db:NP_251101.1"
/codon_start=1
/transl_table=11
/product="putative thioesterase"
/translation="MGGTPVRLFCLPYSGASAMTYSRWRRKLPAWLAVRPVELPGRGA
RMAEPLQTDLASLAQQLARELHDEVRQGPYAMLGHSLGALLACEVLYALRELGCPTPL
GFFACGTAAPSRRAEYDRGFAEPKSDAELIADLRDLQGTPEEVLGNRELMSLTLPILR
ADFLLCGSYRHQRRPPLACPIRTLGGREDKASEEQLLAWAEETRSGFELELFDGGHFF
IHQREAEVLAVVECQVEAWRAGQGAAALAVESAAIC"
Why is it that for the second entry there is a gene field, but for the first there is not?
Thanks
The text was updated successfully, but these errors were encountered: