Releases: ncbi/egapx
Releases · ncbi/egapx
v0.3.1-alpha
Release 0.3.0-alpha
New features integrated from RefSeq EGAP:
- ortholog analysis vs a pre-defined reference species
- refinement of gene biotype (protein-coding, pseudogene, lncRNA) based on annotation and orthology properties
- Assignment of gene symbols, names, and protein names based on orthology or comparison to SwissProt proteins
- Better annotation of single-exon protein-coding genes based on well supported proteins
- Automatic selection of organism symbol format, ortholog reference species, protein reference sets, maximum intron size, and some annotation-related parameters
- Added target protein sets for plant clades and additional vertebrates
- Integration of structural and functional annotation into final output, including: ASN.1, GFF, GTF, mRNA FASTA, CDS FASTA, protein FASTA
Execution improvements:
- Added versioning for EGAPx (egapx.py runner, Docker/Singularity images)
- Added check for user input files
- Improved support for pre-download of reference files
- Updated STAR to produce csi index instead of bai index to work for large sequences
- Increased time limit for chainer
- Updated chunk size for miniprot tasks to 25k
- Enable skipping gnomon training when parameters from closely-related taxa are available
- Relocated Python requirements.txt to repo root
Future plans:
- Workflow for GenBank submission. Contact us if you want to help with testing.
- long-read transcript evidence using minimap2
- short ncRNA prediction with tRNAscan and Rfam
Release v0.2-alpha
- Updated resource allocation for different tasks
- Added support for non-SRA reads
- Added option for off-line mode
- Bug fixes
Release v0.1.2-alpha
- Added configs for biowulf cluster, and biowulf local
- Added config for SLURM, that users will need to edit according to their cluster specifications
- bug fixes
EGAPx alpha release
This version of EGAPx is an alpha release with limited features and organism scope to collect initial feedback on execution. Outputs are not yet complete and not intended for production use.