Skip to content

Commit

Permalink
Merge pull request #36 from nf-core/add-krakenuniqbuild
Browse files Browse the repository at this point in the history
Add KrakenUniq Build
  • Loading branch information
jfy133 authored May 30, 2024
2 parents d9eaab3 + e940613 commit 20be423
Show file tree
Hide file tree
Showing 17 changed files with 214 additions and 34 deletions.
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@

> Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. https://doi.org/10.1186/s13059-019-1891-0
- [KrakenUniq](https://doi.org/10.1186/s13059-018-1568-0)

> Breitwieser, F. P., Baker, D. N., & Salzberg, S. L. (2018). KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biology, 19(1), 198. https://doi.org/10.1186/s13059-018-1568-0
- [MALT](https://doi.org/10.1038/s41559-017-0446-6)

> Vågene, Å. J., Herbig, A., Campana, M. G., Robles García, N. M., Warinner, C., Sabin, S., Spyrou, M. A., Andrades Valtueña, A., Huson, D., Tuross, N., Bos, K. I., & Krause, J. (2018). Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico. Nature Ecology & Evolution, 2(3), 520–528. https://doi.org/10.1038/s41559-017-0446-6
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
- [DIAMOND](https://doi.org/10.1038/nmeth.3176)
- [Kaiju](https://doi.org/10.1038/ncomms11257)
- [Kraken2](https://doi.org/10.1186/s13059-019-1891-0)
- [KrakenUniq](https://doi.org/10.1186/s13059-018-1568-0)
- [MALT](https://doi.org/10.1038/s41559-017-0446-6)

## Usage
Expand Down
12 changes: 10 additions & 2 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ params {

// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = '6.GB'
max_memory = '14.GB'
max_time = '6.h'

// Input data
Expand All @@ -26,12 +26,13 @@ params {

dbname = "database"

build_bracken = true
build_diamond = true
build_kaiju = true
build_malt = true
build_centrifuge = true
build_kraken2 = true
build_bracken = true
build_krakenuniq = true

accession2taxid = params.pipelines_testdata_base_path + 'createtaxdb/data/taxonomy/nucl_gb.accession2taxid'
nucl2taxid = params.pipelines_testdata_base_path + 'createtaxdb/data/taxonomy/nucl2tax.map'
Expand All @@ -40,3 +41,10 @@ params {
namesdmp = params.pipelines_testdata_base_path + 'createtaxdb/data/taxonomy/names.dmp'
malt_mapdb = 's3://ngi-igenomes/test-data/createtaxdb/taxonomy/megan-nucl-Feb2022.db.zip'
}

process {
withName:'KRAKENUNIQ_BUILD'{
memory = { check_max( 12.GB * task.attempt, 'memory' ) }
ext.args = "--work-on-disk --max-db-size 14 --kmer-len 15 --minimizer-len 13 --jellyfish-bin \$(which jellyfish)"
}
}
4 changes: 2 additions & 2 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@ params {
// TODO nf-core: Give any required params for the test so that command line flags are not needed
input = params.pipelines_testdata_base_path + 'viralrecon/samplesheet/samplesheet_full_illumina_amplicon.csv'

build_bracken = true
build_diamond = true
build_kaiju = true
build_malt = true
build_centrifuge = true
build_kraken2 = true
build_bracken = true

build_krakenuniq = true
}
4 changes: 2 additions & 2 deletions conf/test_nothing.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ params {

input = 'https://raw.githubusercontent.com/nf-core/test-datasets/createtaxdb/samplesheets/test.csv'

build_bracken = false
build_diamond = false
build_kaiju = false
build_malt = false
build_centrifuge = false
build_kraken2 = false
build_bracken = false

build_krakenuniq = false
}
19 changes: 19 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [DIAMOND](#diamond) - Database files for DIAMOND
- [Kaiju](#kaiju) - Database files for Kaiju
- [Kraken2](#kraken2) - Database files for Kraken2
- [KrakenUniq](#krakenuniq) - Database files for KrakenUniq
- [MALT](#malt) - Database files for MALT

### MultiQC
Expand Down Expand Up @@ -139,6 +140,24 @@ The `fmi` file can be given to Kaiju itself with `kaiju -f <your_database>.fmi`

The resulting `<db_name>/` directory can be given to Kraken2 itself with `kraken2 --db <your_database_name>` etc.

### KrakenUniq

[KrakenUniq](https://github.com/fbreitwieser/krakenuniq) Metagenomics classifier with unique k-mer counting for more specific results.

<details markdown="1">
<summary>Output files</summary>

- `kraken2/`
- `<db_name>/`
- `database-build.log`: KrakenUniq build process log
- `database.idx`: KrakenUniq index file
- `database.kdb`: KrakenUniq database file
- `taxDB`: KrakenUniq taxonomy information file

</details>

Note there may be additional files in this directory, however the ones listed above are the reportedly the required ones.

### MALT

[MALT](https://software-ab.cs.uni-tuebingen.de/download/malt) is a fast replacement for BLASTX, BLASTP and BLASTN, and provides both local and semi-global alignment capabilities.
Expand Down
6 changes: 6 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,12 @@
"git_sha": "ca87ad032a62f025f0c373facacef2df0c5411b2",
"installed_by": ["fasta_build_add_kraken2_bracken"]
},
"krakenuniq/build": {
"branch": "master",
"git_sha": "e3857325a14ef6e50e33c104c0a3be0ccaabbeb1",
"installed_by": ["modules"],
"patch": "modules/nf-core/krakenuniq/build/krakenuniq-build.diff"
},
"malt/build": {
"branch": "master",
"git_sha": "7d3bac628092d1aead36960c4b6ae41302a9f797",
Expand Down
7 changes: 7 additions & 0 deletions modules/nf-core/krakenuniq/build/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions modules/nf-core/krakenuniq/build/krakenuniq-build.diff

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

37 changes: 37 additions & 0 deletions modules/nf-core/krakenuniq/build/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

48 changes: 48 additions & 0 deletions modules/nf-core/krakenuniq/build/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -65,14 +65,15 @@ params {
malt_mapdb = null

// tool specific options
build_bracken = false
build_diamond = false
build_kaiju = false
build_malt = false
malt_sequencetype = "DNA"
build_centrifuge = false
build_kraken2 = false
kraken2_keepintermediate = false
build_bracken = false
build_krakenuniq = false
}

// Load base.config by default for all pipelines
Expand Down
13 changes: 9 additions & 4 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,12 @@
"description": "",
"default": "",
"properties": {
"build_bracken": {
"type": "boolean",
"fa_icon": "fas fa-save",
"description": "Turn on extending of Kraken2 database to include Bracken files. Requires nucleotide FASTA File input.",
"help_text": "Bracken2 databases are simply just a Kraken2 database with two additional files.\n\nNote however this requires a Kraken2 database _with_ intermediate files still in it, thus can result in large database directories."
},
"build_centrifuge": {
"type": "boolean",
"description": "Turn on building of Centrifuge database. Requires nucleotide FASTA file input.",
Expand Down Expand Up @@ -145,11 +151,10 @@
"fa_icon": "fas fa-save",
"description": "Retain intermediate Kraken2 build files for inspection."
},
"build_bracken": {
"build_krakenuniq": {
"type": "boolean",
"fa_icon": "fas fa-save",
"description": "Turn on extending of Kraken2 database to include Bracken files. Requires nucleotide FASTA File input.",
"help_text": "Bracken2 databases are simply just a Kraken2 database with two additional files.\n\nNote however this requires a Kraken2 database _with_ intermediate files still in it, thus can result in large database directories."
"fa_icon": "fas fa-toggle-on",
"description": "Turn on building of KrakenUniq database. Requires nucleotide FASTA file input."
}
},
"fa_icon": "fas fa-database"
Expand Down
5 changes: 4 additions & 1 deletion subworkflows/local/utils_nfcore_createtaxdb_pipeline/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,7 @@ def toolCitationText() {
params.build_diamond ? "DIAMOND (Buchfink et al. 2015)," : "",
params.build_kaiju ? "Kaiju (Menzel et al. 2016)," : "",
params.build_kraken2 ? "Kraken2 (Wood et al. 2019)," : "",
params.build_krakenuniq ? "KrakenUniq (Breitwieser et al. 2018)," : "",
params.build_malt ? "MALT (Vågene et al. 2018)," : "",
"and MultiQC (Ewels et al. 2016)",
"."
Expand All @@ -217,7 +218,9 @@ def toolBibliographyText() {
params.build_diamond ? "<li>Buchfink, B., Xie, C., & Huson, D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nature Methods, 12(1), 59–60. <a href=\"https://doi.org/10.1038/nmeth.3176\">10.1038/nmeth.3176</a></li>" : "",
params.build_kaiju ? "<li>Menzel, P., Ng, K. L., & Krogh, A. (2016). Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Communications, 7, 11257. <a href=\"https://doi.org/10.1038/ncomms11257\">10.1038/ncomms11257</a></li>" : "",
params.build_kraken2 ? "<li>Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. <a href=\"https://doi.org/10.1186/s13059-019-1891-0\">10.1186/s13059-019-1891-0</a></li>" : "",
params.build_malt ? "<li>Vågene, Å. J., Herbig, A., Campana, M. G., Robles García, N. M., Warinner, C., Sabin, S., Spyrou, M. A., Andrades Valtueña, A., Huson, D., Tuross, N., Bos, K. I., & Krause, J. (2018). Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico. Nature Ecology & Evolution, 2(3), 520–528. <a href=\"https://doi.org/10.1038/s41559-017-0446-6\">10.1038/s41559-017-0446-6</a></li>" : "", "<li>Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: /10.1093/bioinformatics/btw354</li>"
params.build_krakenuniq ? "<li>Breitwieser, F. P., Baker, D. N., & Salzberg, S. L. (2018). KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biology, 19(1), 198. <a href=\"https://doi.org/10.1186/s13059-018-1568-0\">10.1186/s13059-018-1568-0</a></li>" : "",
params.build_malt ? "<li>Vågene, Å. J., Herbig, A., Campana, M. G., Robles García, N. M., Warinner, C., Sabin, S., Spyrou, M. A., Andrades Valtueña, A., Huson, D., Tuross, N., Bos, K. I., & Krause, J. (2018). Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico. Nature Ecology & Evolution, 2(3), 520–528. <a href=\"https://doi.org/10.1038/s41559-017-0446-6\">10.1038/s41559-017-0446-6</a></li>" : "",
"<li>Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: /10.1093/bioinformatics/btw354</li>"
].join(' ').trim()

return reference_text
Expand Down
10 changes: 7 additions & 3 deletions tests/test.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,19 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(
file("$outputDir/bracken/database/database100mers.kmer_distrib").name,
file("$outputDir/bracken/database/database100mers.kraken").name,
file("$outputDir/bracken/database/database.kraken").name,
path("$outputDir/centrifuge/"),
path("$outputDir/diamond/database.dmnd"),
path("$outputDir/kaiju/database.fmi"),
path("$outputDir/kraken2/database/hash.k2d"),
file("$outputDir/kraken2/database/opts.k2d").name,
path("$outputDir/kraken2/database/taxo.k2d"),
file("$outputDir/bracken/database/database100mers.kmer_distrib").name,
file("$outputDir/bracken/database/database100mers.kraken").name,
file("$outputDir/bracken/database/database.kraken").name,
path("$outputDir/krakenuniq/database/database-build.log").readLines().last().contains('database.idx'),
file("$outputDir/krakenuniq/database/database.idx").name,
file("$outputDir/krakenuniq/database/database.kdb"),
file("$outputDir/krakenuniq/database/taxDB"),
path("$outputDir/malt/malt-build.log").readLines().last().contains('Peak memory'),
path("$outputDir/malt/malt_index/index0.idx"),
path("$outputDir/malt/malt_index/ref.db"),
Expand Down
14 changes: 9 additions & 5 deletions tests/test.nf.test.snap
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
{
"test_profile": {
"content": [
"database100mers.kmer_distrib",
"database100mers.kraken",
"database.kraken",
[
"database.1.cf:md5,1481615ab90b5573f6d9e57f97890178",
"database.2.cf:md5,d50fa66e215e80284314ff6521dcd4a4",
Expand All @@ -12,9 +15,10 @@
"hash.k2d:md5,01122a04dcef29ceb3baa68a9f6e6ef5",
"opts.k2d",
"taxo.k2d:md5,cd8170a8c5a1b763a9ac1ffa2107cc88",
"database100mers.kmer_distrib",
"database100mers.kraken",
"database.kraken",
true,
"database.idx",
"database.kdb:md5,a24fce43bedbc6c420f6e36d10c112a3",
"taxDB:md5,1aed1afa948daffc236deba1c5d635db",
true,
"index0.idx:md5,876139dc930e68992cd2625e08bba48a",
"ref.db:md5,377073f58a9f9b85acca59fcf21744a9",
Expand All @@ -26,8 +30,8 @@
],
"meta": {
"nf-test": "0.8.4",
"nextflow": "24.04.1"
"nextflow": "24.04.2"
},
"timestamp": "2024-05-23T08:15:27.641419595"
"timestamp": "2024-05-30T10:54:40.551963562"
}
}
Loading

0 comments on commit 20be423

Please sign in to comment.