Skip to content

Updating GFFs

Michael Alonge edited this page Feb 25, 2021 · 8 revisions

RagTag Version: v1.1.1

RagTag offers utilities to break (correction) or join (scaffold) query sequences to improve draft assemblies. For draft assemblies that have annotations describing genes and/or repeats, RagTag offers a utility to preserve and update annotation features. RagTag correction and scaffolding both produce AGP files defining the exact changes made to produce the output assembly. This AGP file can be used to perfectly update GFF annotation coordinates to refer to the new RagTag corrected/scaffolded assembly.

Usage

usage: ragtag.py updategff [-c] <genes.gff> <ragtag.agp>

Update gff intervals given a RagTag AGP file

positional arguments:
  <genes.gff>     gff file
  <ragtag.*.agp>  agp file

optional arguments:
  -h, --help      show this help message and exit
  -c              update for misassembly correction (ragtag.correction.agp)

Correction

RagTag correction breaks query assemblies at points of putative misassembly. To preserve and update query assembly annotations, one must provide the GFF file to ragtag.py correct with the --gff flag. This ensures that RagTag never breaks a query sequence within a feature defined by the GFF file. Be careful to remove any large GFF features because they may disproportionately invalidate misassembly breakpoints. After correction, update the GFF coordinates with ragtag.py updategff -c. Here is an example of how to preserve and update GFF features with RagTag:

ragtag.py correct --gff genes.gff ref.fa query.fa
ragtag.py updategff -c genes.gff ragtag_output/ragtag.correction.agp > genes.corr.gff

Scaffolding

RagTag scaffolding orders and orients query sequences. After scaffolding, update GFF coordinates with ragtag.py updategff. Here is an example of how to update GFF features with RagTag after scaffolding:

ragtag.py scaffold ref.fa query.fa
ragtag.py updategff genes.gff ragtag_output/ragtag.scaffolds.agp > genes.scaf.gff

Correction and Scaffolding

Please read the instructions above for more details on the correction and scaffolding process. Here is an example of how to run correction and scaffolding while preserving and updating GFF annotations:

REF=ref.fa
QUERY=query.fa
QUERY_PREF=`basename $QUERY .fa`
GENES=genes.gff
GENES_PREF=`basename $GENES .gff`
OUTDIR=ragtag_output

ragtag.py correct --gff $GENES $REF $QUERY
ragtag.py scaffold $REF $OUTDIR/$QUERY_PREF.corrected.fasta
ragtag.py updategff -c $GENES $OUTDIR/ragtag.correction.agp > $OUTDIR/$GENES_PREF.corr.gff
ragtag.py updategff $OUTDIR/$GENES_PREF.corr.gff $OUTDIR/ragtag.scaffolds.agp > $OUTDIR/$GENES_PREF.scaf.gff
Clone this wiki locally