Skip to content
martinghunt edited this page Nov 26, 2015 · 3 revisions

Task: clean

This removes small contigs, and also contigs completely contained in another contig.

Usage and options

The general usage is

circlator clean [options] <in.fasta> <outprefix>

There are the folowing options:

  • --min_contig_length INT: contigs shorter than this are discarded (unless specified using --keep). Default: 2000.
  • --min_contig_percent FLOAT: if length of nucmer hit is at least this percentage of length of contig, then contig is removed. (unless specified using --keep). Default: 95.
  • --diagdiff INT: nucmer diagdiff option. Default: 25.
  • --min_nucmer_id FLOAT: nucmer minimum percent identity. Default: 95.
  • --min_nucmer_length INT: minimum length of hit for nucmer to report. Default: 500.
  • --breaklen INT: breaklen option used by nucmer. Default: 500.
  • --keep FILENAME: file of contig names to keep in output file, one name per line. Contigs named in this file will be kept, regardless of whether or not they are contained in another contig.
  • --verbose: be verbose

Output files

The final cleaned FASTA file is called outprefix.fasta and logging information is written to outprefix.log. An example log file is:

[clean] contig1 user_kept
[clean] contig2 kept
[clean] contig3 small_removed
[clean] contig4 contained in contig2

In this example, contig1 was kept becuase it was specified in the file given by --keep. Contig2 was not contained in any other contigs, so was kept. Contig3 was too short and therefore removed. Contig4 was removed because it was contained in contig2.

The other files are intermediate files made as part of the cleaning process. First, small contigs are removed and the remaining contigs are written to outprefix.remove_small.fa. The nucmer show-coords output of running nucmer on this file against itself is outprefix.coords.

Clone this wiki locally