-
Notifications
You must be signed in to change notification settings - Fork 137
Graph construction
Anton Korobeynikov edited this page Jul 31, 2022
·
4 revisions
spades-gbuilder
is a standalone tool for de Bruijn graph construction from the set of input sequences. The tool supports a variety of input and output formats.
spades-gbuilder <dataset (in YAML or FASTA)> <output filename> [-k <value>] [-c] [-t <value>] [-tmpdir <dir>] [-b <value>] [-unitigs|-fastg|-gfa|-spades]
- First positional argument is either dataset description in SPAdes dataset YAML format (https://github.com/ablab/spades#sec3.2) or just plain FASTA-formatted input file
- Second positional argument specifies an output file
-
-k <int>
k-mer length used for construction (must be odd, default: 21) -
-t <int>
number of CPU threads to use -
-tmp-dir <dir_name>
scratch directory to use -
-b <int>
sorting buffer size (per thread, in bytes, default is ~512 Mb), increasing it increases the memory consumption, but reduces disk usage -
-c
infer coverage, after graph construction infer the k-mer coverage of each node - Output format selection:
-
-unitigs
produce unitigs in FASTA (default) -
-fastg
output graph in FASTG format -
-gfa
output graph in GFA1 format -
-spades
output graph in SPAdes internal format
-
The source code of the tool is located under projects/gbuilder folder. It serves an example of the following SPAdes modules:
- Input handling (loading of datasets and plain reads)
- Read conversion to internal binary format
- Initialization of configuration subsystem
- Building of extension index (raw de Bruijn graph)
- Graph condensing (extracting unbranched paths / unitigs)
- Output generation