Skip to content

Commit

Permalink
move spaligner readme, add citations and links
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewprzh committed May 30, 2024
1 parent 67fa317 commit 4671799
Show file tree
Hide file tree
Showing 8 changed files with 100 additions and 43 deletions.
20 changes: 19 additions & 1 deletion docs/binspreader.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,16 @@ source of information for refining. Optionally, BinSPreader can be provided with
multiple Hi-C and/or paired-end libraries. The [BinSPreader protocol](https://star-protocols.cell.com/protocols/2802) contains more detailed
instructions on installing and running BinSPreader.

## Compilation

To compile SPAligner, run

```
./spades_compile -SPADES_ENABLE_PROJECTS=binspreader
```

After the compilation is complete, `binspreader` executable will be located in the `bin/` folder.

## Command line options

Required positional arguments:
Expand Down Expand Up @@ -69,7 +79,7 @@ binspreader <graph (in GFA)> <binning (in .tsv)> <output directory> [OPTION...]
Labels correction regularization parameter for labeled data (default: 0.6)


### Output
## Output
BinSPreader stores all output files in the output directory `<output_dir> ` set by the user.

- `<output_dir>/binning.tsv` contains refined binning in `.tsv` format
Expand All @@ -83,3 +93,11 @@ In addition
- `<output_dir>/bin_label_1.fastq, <output_dir>/bin_label_2.fastq` read set for bin labeled by `bin_label` (if `--reads` was used)
- `<output_dir>/pe_links.tsv` list of paired-end links between assembly graph edges with weights (if `--debug` was used)
- `<output_dir>/graph_links.tsv` list of graph links between assembly graph edges with weights (if `--debug` was used)


## References

If you are using **BinSPreader** in your research, please cite:

[Tolstoganov et al., 2022](https://www.cell.com/iscience/pdf/S2589-0042(22)01042-2.pdf) and
[Ochkalova et al., 2023](https://www.sciencedirect.com/science/article/pii/S2666166723003842).
18 changes: 9 additions & 9 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,20 +89,20 @@ bin/spades.py --rnaviral -1 left.fastq.gz -2 right.fastq.gz -o output_folder

## Standalone SPAdes tools

- `spades-kmercount` - k-mer counting;
- [`spades-kmercount`](standalone.md#k-mer-counter) - k-mer counting;

- `spades-read-filter` - read filtering using k-mer coverage;
- [`spades-read-filter`](standalone.md#k-mer-coverage-read-filter) - read filtering using k-mer coverage;

- `spades-kmer-estimating` - estimating number of unique k-mers;
- [`spades-kmer-estimating`](standalone.md#k-mer-cardinality-estimating) - estimating number of unique k-mers;

- `spades-gbuilder` - assembly graph construction;
- [`spades-gbuilder`](standalone.md#graph-construction) - assembly graph construction;

- `spades-gsimplifier` - assembly graph simplification;
- [`spades-gsimplifier`](standalone.md#graph-simplification) - assembly graph simplification;

- `spalgner` - alignment of long reads to assembly graph;
- [`spalgner`](spaligner.md) - alignment of long reads to assembly graph;

- `spades-gmapper` - specific alignment of long reads to assembly graph used in hybrid assembly pipeline;
- [`spades-gmapper`](standalone.md#long-read-to-graph-alignment) - specific alignment of long reads to assembly graph used in hybrid assembly pipeline;

- `binspreader` - refinement of metagenome-assembled genomes;
- [`binspreader`](binspreader.md) - refinement of metagenome-assembled genomes;

- `pathracer` - alignment of profile HMMs to assembly graph.
- [`pathracer`](pathracer.md) - alignment of profile HMMs to assembly graph.
4 changes: 2 additions & 2 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ for example:

which will install SPAdes into `/usr/local/bin`.

After installation you will get the same files (listed above) in `./bin` directory (or `<destination_dir>/bin` if you specified PREFIX). We also suggest adding `bin` directory to the `PATH` variable.
After installation, you will get the same files (listed above) in `./bin` directory (or `<destination_dir>/bin` if you specified PREFIX). We also suggest adding `bin` directory to the `PATH` variable.

## Building additional tools
SPAdes toolkit includes a number of standalone tools that are built using core
Expand All @@ -106,7 +106,7 @@ subset of SPAdes components. The components are:
- [`spades_tools`](standalone.md)
- [`binspareader`](binspreader.md)
- [`pathracer`](pathracer.md)
- [`spaligner`](standalone.md#spaligner)
- [`spaligner`](spaligner.md)

By default, only SPAdes and SPAdes tools are enabled (so
`-DSPADES_ENABLE_PROJECTS="spades;spades_tools"` is the default). Alternatively,
Expand Down
21 changes: 14 additions & 7 deletions docs/pathracer.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,16 @@ Both tool use extended pHMM model allowing frame shifts:
but for `pathracer-seq-fs` this extension is crucial: for aligning amino-acid pHMMs without allowing indels in the nucleotide space
six frame translation + `hmmsearch` from **HMMer** package is more than enough.

## Compilation

To compile SPAligner, run

```
./spades_compile -SPADES_ENABLE_PROJECTS=pathracer
```

After the compilation is complete, `pathracer` executable will be located in the `bin/` folder.

## Input
Currently, the tool supports only _de Bruijn_ graphs in GFA format as produced by **SPAdes** or compatible assembler in this matter (e.g., **MEGAHIT**).
Contact us if you need some other format support. Input sequences are supposed to be in FASTA/FASTQ format.
Expand Down Expand Up @@ -192,12 +202,9 @@ pathracer bac.hmm synth_strain_gbuilder.gfa --queries 16S_rRNA -m 250 --top 1000
```

## References
If you are using **PathRacer** in your research, please cite:
A. Shlemov and A. Korobeynikov. PathRacer: racing profile HMM paths on assembly
graph. In _Proceedings of International Conference on Algorithms for Computational Biology,
AlCoB 2019. Berkeley, California, USA, May 28&ndash;30, 2019,_ volume 11488 LNCS, pages
80&ndash;94, 2019.
<https://link.springer.com/chapter/10.1007/978-3-030-18174-1_6>

If you are using **PathRacer** in your research, please cite:

[Shlemov and Korobeynikov, 2019](https://link.springer.com/chapter/10.1007/978-3-030-18174-1_6)

In case of any problems running **PathRacer** please contact [SPAdes support](https://github.com/ablab/spades/issues) attaching the log file.
Your suggestions are also very welcome!
File renamed without changes
66 changes: 48 additions & 18 deletions src/projects/spaligner/README.md → docs/spaligner.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,57 @@
# SPAligner
# SPAligner: long read to graph aligner

SPAligner is a tool for fast and accurate alignment of nucleotide sequences to assembly graphs.
It takes file with sequences (in fasta/fastq format) and assembly in GFA format and outputs long read
to graph alignment in various formats (such as tsv, fasta and [GPA](https://github.com/ocxtal/gpa "GPA-format spec")).


## Compilation

To compile SPAligner, run

```
./spades_compile -SPADES_ENABLE_PROJECTS=spaligner
```

After the compilation is complete, `spaligner` executable will be located in the `bin/` folder.

Tool for fast and accurate alignment of nucleotide sequences (s.a. long reads, coding sequences, etc.) to assembly graphs.

## Running SPAligner

spaligner spaligner_config.yaml \ # config file
Synopsis:

spaligner spaligner_config.yaml \ # config file
-d pacbio \ # data type: pacbio, nanopore
-g assembly_graph.gfa \ # gfa-file with assembly graph
-k 77 \ # graph K-mer size
-s pacbio_reads.fastq.gz \ # sequences to align in fasta/fastq formats
-t 8 # number of threads, 8 by default
-g assembly_graph.gfa \ # assembly graph
-k 77 \ # graph k-mer size
-s pacbio_reads.fastq.gz \ # input sequences / reads
-t 8 # number of threads

By default, spaligner_config.yaml will be installed into /usr/share/spaligner/ or can be found in assembler/projects/spaligner/.
By default, `spaligner_config.yaml` can be found in `src/projects/spaligner/`.

Alignments will be saved to spaligner_result/alignment.tsv by default.
Alignments will be saved to `spaligner_result/alignment.tsv` by default.


## Compilation
### Command line options

`-d <type> `
long reads type: `nanopore` or `pacbio`

`-s <filename> `
file with sequences in FASTA or FASTQ formats (can be gzipped)

git clone https://github.com/ablab/spades.git
cd spades/assembler/
mkdir build && cd build && cmake ../src
make spaligner
`-g <filename> `
file with an assembly graph in GFA format

Now to run SPAligner move to folder `assembler/` and execute
`-k <int> `
k-mer length that was used for graph construction

`-t <int> `
number of threads (default: 8)

`-o, --outdir <dir> `
output directory to use (default: `spaligner_result/`)

build/bin/spaligner

## Output

Expand Down Expand Up @@ -102,7 +128,7 @@ If a sequence was not fully aligned, SPAligner tries to prolong the longest alig

Overview of the alignment of the nucleotide query sequence *S* (orange bar) to assembly graph *G*. Assembly graph edges are considered directed left-to-right (explicit edge orientation was omitted to improve the clarity).

![pipeline](pipeline.jpg)
![pipeline](spaligner.jpg)

1. **Anchor search.** Anchors (regions of high similarity) between the query and the edge labels are identified with [BWA-MEM](http://bio-bwa.sourceforge.net/).
2. **Anchor filtering.** Anchors shorter than *K*, assembly graph *K*-mer size,(anchors 2, 6, 11), anchors “in the middle” of long edge (anchor 7) or ambiguous anchors (anchor 10 mostly covered by anchor 9, both anchors 4 and 5) are discarded.
Expand Down Expand Up @@ -146,6 +172,10 @@ Increase of `max_gs_states`, `max_restorable_length`, `queue_limit`, `iteration_
Turning off restore_ends or run_dijkstra in nucleotide sequence alignment mode leads to shorter alignments, but considerable speed-up.


## Contacts
## References

If you are using **SPAligner** in your research, please cite:

[Dvorkina et al., 2020](https://link.springer.com/article/10.1186/s12859-020-03590-7)

For any questions or suggestions please do not hesitate to contact Tatiana Dvorkina <tedvorkina@gmail.com>.
13 changes: 7 additions & 6 deletions docs/standalone.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,12 +169,17 @@ Additional options are:
original graph


## Long read to graph alignment

## hybridSPAdes aligner

_Not to be confused with [SPAligner](spaligner.md)._

### hybridSPAdes aligner
A tool `spades-gmapper ` gives the opportunity to extract long read alignments generated with hybridSPAdes pipeline options. It has three mandatory options: dataset description file in [YAML format](running.md#specifying-multiple-libraries-with-yaml-data-set-file), graph file in GFA format and an output file name.

While `spades-gmapper` is a solution for those who work on hybridSPAdes assembly and
want to get exactly its intermediate results, [SPAligner](spaligner.md) is an end-product application for sequence-to-graph alignment with tunable parameters and output types.


Synopsis: `spades-gmapper <dataset description (in YAML)> <graph (in GFA)> <output filename> [-k <value>] [-t <value>] [-tmpdir <dir>]`

Additional options are:
Expand All @@ -188,13 +193,11 @@ Additional options are:
`-tmpdir <dir_name> `
scratch directory to use

While `spades-gmapper` is a solution for those who work on hybridSPAdes assembly and want to get exactly its intermediate results, [SPAligner](standalone.md#spaligner) is an end-product application for sequence-to-graph alignment with tunable parameters and output types.


### SPAligner
A tool for fast and accurate alignment of nucleotide sequences to assembly graphs. It takes file with sequences (in fasta/fastq format) and assembly in GFA format and outputs long read to graph alignment in various formats (such as tsv, fasta and [GPA](https://github.com/ocxtal/gpa "GPA-format spec")).

Synopsis: `spaligner src/projects/spaligner_config.yaml -d <value> -s <value> -g <value> -k <value> [-t <value>] [-o <value>]`

Parameters are:

Expand All @@ -216,8 +219,6 @@ Parameters are:
`-o, --outdir <dir> `
output directory to use (default: spaligner_result/)

For more information on parameters and options please refer to the main SPAligner manual (assembler/src/projects/spaligner/README.md).

Also if you want to align protein sequences please refer to our [pre-release version](https://github.com/ablab/spades/releases/tag/spaligner-paper).

Note that in order you use SPAligner one needs either to use pre-built binaries or compile SPAdes from sources using the additional `-DSPADES_ENABLE_PROJECTS=spaligner` option.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ nav:
- Transcriptome assembly: rna.md
- Binning refining: binspreader.md
- HMM mapping on assembly graph: pathracer.md
- Sequence to graph alignment: spaligner.md
- SPAdes tools: standalone.md
- Citation: citation.md
- Feedback: feedback.md
Expand Down

0 comments on commit 4671799

Please sign in to comment.