Skip to content

Graph construction

Anton Korobeynikov edited this page Jul 31, 2022 · 4 revisions

spades-gbuilder is a standalone tool for de Bruijn graph construction from the set of input sequences. The tool supports a variety of input and output formats.

Synopsis

spades-gbuilder <dataset (in YAML or FASTA)> <output filename> [-k <value>] [-c] [-t <value>] [-tmpdir <dir>] [-b <value>] [-unitigs|-fastg|-gfa|-spades]

Mandatory arguments

  • First positional argument is either dataset description in SPAdes dataset YAML format (https://github.com/ablab/spades#sec3.2) or just plain FASTA-formatted input file
  • Second positional argument specifies an output file

Options

  • -k <int> k-mer length used for construction (must be odd, default: 21)
  • -t <int> number of CPU threads to use
  • -tmp-dir <dir_name> scratch directory to use
  • -b <int> sorting buffer size (per thread, in bytes, default is ~512 Mb), increasing it increases the memory consumption, but reduces disk usage
  • -c infer coverage, after graph construction infer the k-mer coverage of each node
  • Output format selection:
    • -unitigs produce unitigs in FASTA (default)
    • -fastg output graph in FASTG format
    • -gfa output graph in GFA1 format
    • -spades output graph in SPAdes internal format

Developer notes

The source code of the tool is located under projects/gbuilder folder. It serves an example of the following SPAdes modules:

  • Input handling (loading of datasets and plain reads)
  • Read conversion to internal binary format
  • Initialization of configuration subsystem
  • Building of extension index (raw de Bruijn graph)
  • Graph condensing (extracting unbranched paths / unitigs)
  • Output generation
Clone this wiki locally