-
Notifications
You must be signed in to change notification settings - Fork 50
Updating GFFs
RagTag Version: v1.1.1
RagTag offers utilities to break (correction) or join (scaffold) query sequences to improve draft assemblies. For draft assemblies that have annotations describing genes and/or repeats, RagTag offers a utility to preserve and update annotation features. RagTag correction and scaffolding both produce AGP files defining the exact changes made to produce the output assembly. This AGP file can be used to perfectly update GFF annotation coordinates to refer to the new RagTag corrected/scaffolded assembly.
usage: ragtag.py updategff [-c] <genes.gff> <ragtag.agp>
Update gff intervals given a RagTag AGP file
positional arguments:
<genes.gff> gff file
<ragtag.*.agp> agp file
optional arguments:
-h, --help show this help message and exit
-c update for misassembly correction (ragtag.correction.agp)
RagTag correction breaks query assemblies at points of putative misassembly. To preserve and update query assembly annotations, one must provide the GFF file to ragtag.py correct
with the --gff
flag. This ensures that RagTag never breaks a query sequence within a feature defined by the GFF file. Be careful to remove any large GFF features because they may disproportionately invalidate misassembly breakpoints. After correction, update the GFF coordinates with ragtag.py updategff -c
. Here is an example of how to preserve and update GFF features with RagTag:
ragtag.py correct --gff genes.gff ref.fa query.fa
ragtag.py updategff -c genes.gff ragtag_output/ragtag.correction.agp > genes.corr.gff
RagTag scaffolding orders and orients query sequences. After scaffolding, update GFF coordinates with ragtag.py updategff
. Here is an example of how to update GFF features with RagTag after scaffolding:
ragtag.py scaffold ref.fa query.fa
ragtag.py updategff genes.gff ragtag_output/ragtag.scaffolds.agp > genes.scaf.gff
Please read the instructions above for more details on the correction and scaffolding process. Here is an example of how to run correction and scaffolding while preserving and updating GFF annotations:
REF=ref.fa
QUERY=query.fa
QUERY_PREF=`basename $QUERY .fa`
GENES=genes.gff
GENES_PREF=`basename $GENES .gff`
OUTDIR=ragtag_output
ragtag.py correct --gff $GENES $REF $QUERY
ragtag.py scaffold $REF $OUTDIR/$QUERY_PREF.corrected.fasta
ragtag.py updategff -c $GENES $OUTDIR/ragtag.correction.agp > $OUTDIR/$GENES_PREF.corr.gff
ragtag.py updategff $OUTDIR/$GENES_PREF.corr.gff $OUTDIR/ragtag.scaffolds.agp > $OUTDIR/$GENES_PREF.scaf.gff
Are these docs confusing or incomplete? Please open an issue and let me know.