Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update fasta_build_add_kraken2_bracken subworkflow #34

Merged
merged 9 commits into from
May 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@

> Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

- [Bracken](https://doi.org/10.7717/peerj-cs.104)

> Lu, J., Breitwieser, F. P., Thielen, P., & Salzberg, S. L. (2017). Bracken: estimating species abundance in metagenomics data. PeerJ. Computer Science, 3(e104), e104. https://doi.org/10.7717/peerj-cs.104

- [Centrifuge](https://doi.org/10.1101/gr.210641.116)

> Kim, D., Song, L., Breitwieser, F. P., & Salzberg, S. L. (2016). Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Research, 26(12), 1721–1729. https://doi.org/10.1101/gr.210641.116
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@

1. Prepares input FASTA files for building
2. Builds databases for:
- [Bracken](https://doi.org/10.7717/peerj-cs.104)
- [Centrifuge](https://doi.org/10.1101/gr.210641.116)
- [DIAMOND](https://doi.org/10.1038/nmeth.3176)
- [Kaiju](https://doi.org/10.1038/ncomms11257)
Expand Down Expand Up @@ -84,7 +85,7 @@ For more details about the output files and reports, please refer to the

## Credits

nf-core/createtaxdb was originally written by James A. Fellows Yates and the nf-core community.
nf-core/createtaxdb was originally written by James A. Fellows Yates, Joon Klaps, Alexander Ramos Díaz and the nf-core community.

We thank the following people for their extensive assistance in the development of this pipeline:

Expand Down
5 changes: 4 additions & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,16 @@ params {
// Input data
// TODO nf-core: Specify the paths to your test data on nf-core/test-datasets
// TODO nf-core: Give any required params for the test so that command line flags are not needed
input = params.pipelines_testdata_base_path + 'createtaxdb/samplesheets/test.csv'
input = params.pipelines_testdata_base_path + 'createtaxdb/samplesheets/test.csv'

dbname = "database"

build_diamond = true
build_kaiju = true
build_malt = true
build_centrifuge = true
build_kraken2 = true
build_bracken = true

accession2taxid = params.pipelines_testdata_base_path + 'createtaxdb/data/taxonomy/nucl_gb.accession2taxid'
nucl2taxid = params.pipelines_testdata_base_path + 'createtaxdb/data/taxonomy/nucl2tax.map'
Expand Down
8 changes: 8 additions & 0 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,12 @@ params {
// TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
// TODO nf-core: Give any required params for the test so that command line flags are not needed
input = params.pipelines_testdata_base_path + 'viralrecon/samplesheet/samplesheet_full_illumina_amplicon.csv'

build_diamond = true
build_kaiju = true
build_malt = true
build_centrifuge = true
build_kraken2 = true
build_bracken = true

}
1 change: 1 addition & 0 deletions conf/test_nothing.config
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,6 @@ params {
build_malt = false
build_centrifuge = false
build_kraken2 = false
build_bracken = false

}
29 changes: 29 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
- [Bracken](#bracken) - Database files for Brakcen
- [Centrifuge](#centrifuge) - Database files for Centrifuge
- [DIAMOND](#diamond) - Database files for DIAMOND
- [Kaiju](#kaiju) - Database files for Kaiju
Expand Down Expand Up @@ -51,6 +52,31 @@ Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQ

[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

### Bracken

[Bracken](https://github.com/jenniferlu717/Bracken/)(Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

<details markdown="1">
<summary>Output files</summary>

- `bracken/`
- `<db_name>/`
- `database100mers.kmer_distrib`: Bracken kmer distribution file
- `database100mers.kraken`: Bracken index file
- `database.kraken`: Bracken database file
- `hash.k2d`: Kraken2 hash database file
- `opts.k2d`: Kraken2 opts database file
- `taxo.k2d`: Kraken2 taxo database file
- `library/`: Intermediate Kraken2 directory containing FASTAs and related files of added genomes
- `taxonomy/`: Intermediate Kraken2 directory containing taxonomy files of added genomes
- `seqid2taxid.map`: Intermediate Kraken2 file containing taxonomy files of added genomes

</details>

Note that all intermediate files are required for Bracken2 database, even if Kraken2 itself only requires the `*.k2d` files.

The resulting `<db_name>/` directory can be given to Bracken itself with `bracken -d <your_database_name>` etc.

### Centrifuge

[Centrifuge](https://github.com/bbuchfink/diamond) is a very rapid and memory-efficient system for the classification of DNA sequences from microbial samples.
Expand Down Expand Up @@ -105,6 +131,9 @@ The `fmi` file can be given to Kaiju itself with `kaiju -f <your_database>.fmi`
- `hash.k2d`: Kraken2 hash database file
- `opts.k2d`: Kraken2 opts database file
- `taxo.k2d`: Kraken2 taxo database file
- `library/`: Intermediate directory containing FASTAs and related files of added genomes (only present if `--build_bracken` or `--kraken2_keepintermediate` supplied)
- `taxonomy/`: Intermediate directory containing taxonomy files of added genomes (only present if `--build_bracken` or `--kraken2_keepintermediate` supplied)
- `seqid2taxid.map`: Intermediate file containing taxonomy files of added genomes (only present if `--build_bracken` or `--kraken2_keepintermediate` supplied)

</details>

Expand Down
13 changes: 9 additions & 4 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
"https://github.com/nf-core/modules.git": {
"modules": {
"nf-core": {
"bracken/build": {
"branch": "master",
"git_sha": "dcbe6e77bc6cc0843ce93e6c7bd884d65c215984",
"installed_by": ["fasta_build_add_kraken2_bracken"]
},
"cat/cat": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
Expand Down Expand Up @@ -38,12 +43,12 @@
"kraken2/add": {
"branch": "master",
"git_sha": "ca87ad032a62f025f0c373facacef2df0c5411b2",
"installed_by": ["fasta_build_add_kraken2"]
"installed_by": ["fasta_build_add_kraken2_bracken"]
},
"kraken2/build": {
"branch": "master",
"git_sha": "ca87ad032a62f025f0c373facacef2df0c5411b2",
"installed_by": ["fasta_build_add_kraken2"]
"installed_by": ["fasta_build_add_kraken2_bracken"]
},
"malt/build": {
"branch": "master",
Expand All @@ -69,9 +74,9 @@
},
"subworkflows": {
"nf-core": {
"fasta_build_add_kraken2": {
"fasta_build_add_kraken2_bracken": {
"branch": "master",
"git_sha": "a4d1e13a2da05307deb65a87d501aa6520162dcd",
"git_sha": "9758e4dedd5788369e61b57e7d6f4751e682b17a",
"installed_by": ["subworkflows"]
},
"utils_nextflow_pipeline": {
Expand Down
8 changes: 8 additions & 0 deletions modules/nf-core/bracken/build/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

48 changes: 48 additions & 0 deletions modules/nf-core/bracken/build/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

47 changes: 47 additions & 0 deletions modules/nf-core/bracken/build/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

72 changes: 72 additions & 0 deletions modules/nf-core/bracken/build/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading