Skip to content

Commit

Permalink
Fix some documentation regarding corona, bio, wastewater
Browse files Browse the repository at this point in the history
  • Loading branch information
1dayac authored and asl committed May 1, 2024
1 parent 2eb0cb9 commit 950df06
Show file tree
Hide file tree
Showing 5 changed files with 18 additions and 15 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ If you use other pipelines, please cite the following papers:
- metaviralSPAdes / viralVerify: [Antipov et al., 2020](https://academic.oup.com/bioinformatics/article-abstract/36/14/4126/5837667)
- rnaSPAdes: [Bushmanova et al., 2019](https://academic.oup.com/gigascience/article/8/9/giz100/5559527).
- biosyntheticSPAdes: [Meleshko et al., 2019](https://genome.cshlp.org/content/early/2019/06/03/gr.243477.118?top=1).
- coronaSPAdes paper is currently available at [bioRxiv](https://www.biorxiv.org/content/10.1101/2020.07.28.224584v1.abstract).
- coronaSPAdes: [Meleshko et al., 2022](https://academic.oup.com/bioinformatics/article/38/1/1/6354349).

You may also include older papers [Nurk, Bankevich et al., 2013](http://link.springer.com/chapter/10.1007%2F978-3-642-37195-0_13) or [Bankevich, Nurk et al., 2012](http://online.liebertpub.com/doi/abs/10.1089/cmb.2012.0021), especially if you assemble single-cell data.

Expand Down
2 changes: 1 addition & 1 deletion docs/citation.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ If you use other pipelines, please cite the following papers:
- metaviralSPAdes / viralVerify: [Antipov et al., 2020](https://academic.oup.com/bioinformatics/article-abstract/36/14/4126/5837667)
- rnaSPAdes: [Bushmanova et al., 2019](https://academic.oup.com/gigascience/article/8/9/giz100/5559527).
- biosyntheticSPAdes: [Meleshko et al., 2019](https://genome.cshlp.org/content/early/2019/06/03/gr.243477.118?top=1).
- coronaSPAdes paper is currently available at [bioRxiv](https://www.biorxiv.org/content/10.1101/2020.07.28.224584v1.abstract).
- coronaSPAdes: [Meleshko et al., 2022](https://academic.oup.com/bioinformatics/article/38/1/1/6354349).

You may also include older papers [Nurk, Bankevich et al., 2013](http://link.springer.com/chapter/10.1007%2F978-3-642-37195-0_13) or [Bankevich, Nurk et al., 2012](http://online.liebertpub.com/doi/abs/10.1089/cmb.2012.0021), especially if you assemble single-cell data.

18 changes: 6 additions & 12 deletions docs/hmm.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,16 @@
# HMM-guided mode
The majority of SPAdes assembly modes (normal multicell, single-cell, rnaviral, meta and of course biosynthetic) also supports HMM-guided mode as implemented in biosyntheticSPAdes. The detailed description could be found in [biosyntheticSPAdes paper](https://genome.cshlp.org/content/early/2019/06/03/gr.243477.118), but in short: amino acid profile HMMs are aligned to the edges of assembly graph. After this the subgraphs containing the set of matches ("domains") are extracted and all possible paths through the domains that are supported both by paired-end data (via scaffolds) and graph topology are obtained (putative biosynthetic gene clusters).
The majority of SPAdes assembly modes (multicell, single-cell, rnaviral, meta and biosynthetic) also supports HMM-guided mode as implemented in biosyntheticSPAdes.
The detailed description could be found in [biosyntheticSPAdes paper](https://genome.cshlp.org/content/early/2019/06/03/gr.243477.118), but in short: amino acid profile HMMs are aligned to the edges of assembly graph.
After this the subgraphs containing the set of matches ("domains") are extracted and all possible paths through the domains that are supported both by paired-end data (via scaffolds) and graph topology are obtained (putative biosynthetic gene clusters).

HMM-guided mode could be enabled via providing a set of HMMs via `--custom-hmms` option. In HMM guided mode the set of contigs and scaffolds (see [SPAdes output](output.md#spades-output) section for more information ) is kept intact, however additional [biosyntheticSPAdes output](output.md#biosyntheticspades-output) represents the output of HMM-guided assembly.
HMM-guided mode is enabled via providing a set of HMMs (`*.hmm.gz` file) via `--custom-hmms` option. In HMM-guided mode the set of contigs and scaffolds (see [SPAdes output](output.md#spades-output) section for more information ) is kept intact, however additional [biosyntheticSPAdes output](output.md#biosyntheticspades-output) represents the output of HMM-guided assembly.

Note that normal biosyntheticSPAdes mode (via `--bio` option) is a bit different from HMM-guided mode: besides using the special set of profile HMMS representing a family of NRSP/PKS domains also includes a set of assembly graph simplification and processing settings aimed for fuller recovery of biosynthetic gene clusters.

## coronaSPAdes mode

Given an increased interest in coronavirus research we developed a coronavirus assembly mode for SPAdes assembler (also known as coronaSPAdes). It allows to assemble full-length coronaviridae genomes from the transcriptomic and metatranscriptomic data. Algorithmically, coronaSPAdes is an rnaviralSPAdes that uses the set of HMMs from [Pfam SARS-CoV-2 2.0](ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam_SARS-CoV-2_2.0/) set as well as additional HMMs as outlined by [(Phan et al, 2019)](https://doi.org/10.1093/ve/vey035). coronaSPAdes could be run via a dedicated `coronaspades.py` script. See [coronaSPAdes preprint](https://www.biorxiv.org/content/10.1101/2020.07.28.224584v1) for more information about rnaviralSPAdes, coronaSPAdes and HMM-guided mode. Output for any HMM-related mode (`--bio`, `--corona`, or `--custom-hmms` flags) is the same with biosyntheticSPAdes' output.
Given an increased interest in coronavirus research we developed a coronavirus assembly mode for SPAdes assembler (also known as coronaSPAdes).
It allows to assemble full-length coronaviridae genomes from the transcriptomic and metatranscriptomic data. Algorithmically, coronaSPAdes is an rnaviralSPAdes that uses the set of HMMs from [Pfam SARS-CoV-2 2.0](ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam_SARS-CoV-2_2.0/) set as well as additional HMMs as outlined by [(Phan et al, 2019)](https://doi.org/10.1093/ve/vey035). coronaSPAdes could be run via a dedicated `coronaspades.py` script. See [coronaSPAdes paper](https://academic.oup.com/bioinformatics/article/38/1/1/6354349) for more information about rnaviralSPAdes, coronaSPAdes and HMM-guided mode. Output for any HMM-related mode (`--bio`, `--corona`, or `--custom-hmms` flags) is the same with biosyntheticSPAdes' output.


## wastewaterSPAdes mode

SARS-CoV-2 wastewater samples are extensively collected and studied because it allows quantitative assessment of viral load in surrounding populations. We developed wastewaterSPAdes that solves the SARS-CoV-2 deconvolution problem using assembly graph structure.
To use wastewaterSPAdes, you'll need to:

- Set `--sewage` flag to the `coronaspades.py`.
- Provide the SARS-CoV-2 reference genome as trusted contigs.

Results of wastewaterSPAdes are stored in `lineages.csv` file. First column contains the strain name, and second column contains estimated abundance of this strain in the sample.

2 changes: 1 addition & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ biosyntheticSPAdes outputs four files of interest:
- scaffolds.fasta – contains DNA sequences from putative biosynthetic gene clusters (BGC). Since each sample may contain multiple BGCs and biosyntheticSPAdes can output several putative DNA sequences for each cluster, for each contig name we append suffix `_cluster_X_candidate_Y`, where X is the id of the BGC and Y is the id of the candidate from the BGC.
- raw_scaffolds.fasta - SPAdes scaffolds generated without domain-graph related algorithms. Very close to the regular scaffolds.fasta file.
- hmm_statistics.txt - contains statistics about BGC composition in the sample. First, it outputs the number of domain hits in the sample. Then, for each BGC candidate we output domain order with positions on the corresponding DNA sequence from scaffolds.fasta.
- domain_graph.dot - contains domain graph structure that can be used to assess complexity of the sample and structure of BGCs. For more information about domain graph construction, please refer to the paper.
- domain_graph.dot - contains domain graph structure that can be used to assess complexity of the sample and structure of BGCs. For more information about domain graph construction, please refer to the [biosyntheticSPAdes](https://genome.cshlp.org/content/early/2019/06/03/gr.243477.118?top=1) paper.


## rnaSPades output
Expand Down
9 changes: 9 additions & 0 deletions docs/running.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,15 @@ This flag is required when assembling IonTorrent data. Allows BAM files as
input. Carefully read [IonTorrent section](datatypes.md#assembling-iontorrent-reads)
before using this option.

#### `--sewage`
This flag is used to run wastewaterSPAdes.
SARS-CoV-2 wastewater samples are extensively collected and studied because it allows quantitative assessment of viral load in surrounding populations.
wastewaterSPAdes solves the SARS-CoV-2 deconvolution problem using assembly graph structure. To run wastewaterSPAdes you need to provide SARS-CoV-2 reference genome as trusted contigs.

Results of wastewaterSPAdes are stored in `lineages.csv` file. First column contains the strain name, and second column contains estimated abundance of this strain in the sample.



## Basic options

`-o <output_dir> `
Expand Down

0 comments on commit 950df06

Please sign in to comment.