An adjacency-list based De Bruijn graphs implementation
- C compiler
- make
- zlib
- gengetopt
Tested on Ubuntu 16.04 and Windows 10 using Ubuntu bash
Download from Github
$ git clone git git@github.com:ericniso/hague.git
$ cd hague
Download submodules
$ git submodule init
$ git submodule update
Compile binaries, executable file is in debug
folder
$ make bin
Compile debug binaries, executable file is in debug
folder
$ make debug
Compile shared library, lib file is in lib
folder
$ make lib
Compile all previous targets
$ make all
After compiling the source code, we recommend to run automated tests in order to check for compilation success
$ make test
Docs can be generated using
$ make doc
This will generated source code documentation both in html
and latex
in docs
folder
$ hague -f "/path/to/fasta/file" -k "k-mer-length"
Input file can be compressed .gz
or not
Such command will generate the de Bruijn
graph and output the result in csv
format, here is an example output:
$ hague -f ~/home/nicolaas/chr1_KI270707v1_random.fa.gz -k 10 | head -n 5
Source, Target, Label
AGGGGTCTG, GGGGTCTGC, AGGGGTCTGC
GGGGTCTGC, GGGTCTGCT, GGGGTCTGCT
GGGTCTGCT, GGTCTGCTT, GGGTCTGCTT
GGTCTGCTT, GTCTGCTTA, GGTCTGCTTA
From left to right we can see:
Source
is the source node, whose key is a(k-1)-mer
Target
is the node that the source node is pointing to through and edgeLabel
is the edge label connecting the previous two nodes, and represents the correspondingk-mer
string
If you want to redirect the output to a file you can specify a filename using the -o
option:
$ hague -f "/path/to/fasta/file" -k "k-mer-length" -o "/path/to/output/file"
There's an additional feature, which is the superstring reconstruction, invoked by adding -w
option:
$ hague -f "/path/to/fasta/file" -k "k-mer-length" -w [-o "/path/to/output/file"]
Currently this is only a work in progress, since it only works if the graph is Eulerian (semi-Eulerian)
Eric Nisoli, Lorenzo Mammana
MIT LICENSE