Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KrakenUniq Build #36

Merged
merged 9 commits into from
May 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@

> Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. https://doi.org/10.1186/s13059-019-1891-0

- [KrakenUniq](https://doi.org/10.1186/s13059-018-1568-0)

> Breitwieser, F. P., Baker, D. N., & Salzberg, S. L. (2018). KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biology, 19(1), 198. https://doi.org/10.1186/s13059-018-1568-0

- [MALT](https://doi.org/10.1038/s41559-017-0446-6)

> Vågene, Å. J., Herbig, A., Campana, M. G., Robles García, N. M., Warinner, C., Sabin, S., Spyrou, M. A., Andrades Valtueña, A., Huson, D., Tuross, N., Bos, K. I., & Krause, J. (2018). Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico. Nature Ecology & Evolution, 2(3), 520–528. https://doi.org/10.1038/s41559-017-0446-6
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
- [DIAMOND](https://doi.org/10.1038/nmeth.3176)
- [Kaiju](https://doi.org/10.1038/ncomms11257)
- [Kraken2](https://doi.org/10.1186/s13059-019-1891-0)
- [KrakenUniq](https://doi.org/10.1186/s13059-018-1568-0)
- [MALT](https://doi.org/10.1038/s41559-017-0446-6)

## Usage
Expand Down
12 changes: 10 additions & 2 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ params {

// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = '6.GB'
max_memory = '14.GB'
max_time = '6.h'

// Input data
Expand All @@ -26,12 +26,13 @@ params {

dbname = "database"

build_bracken = true
build_diamond = true
build_kaiju = true
build_malt = true
build_centrifuge = true
build_kraken2 = true
build_bracken = true
build_krakenuniq = true

accession2taxid = params.pipelines_testdata_base_path + 'createtaxdb/data/taxonomy/nucl_gb.accession2taxid'
nucl2taxid = params.pipelines_testdata_base_path + 'createtaxdb/data/taxonomy/nucl2tax.map'
Expand All @@ -40,3 +41,10 @@ params {
namesdmp = params.pipelines_testdata_base_path + 'createtaxdb/data/taxonomy/names.dmp'
malt_mapdb = 's3://ngi-igenomes/test-data/createtaxdb/taxonomy/megan-nucl-Feb2022.db.zip'
}

process {
withName:'KRAKENUNIQ_BUILD'{
memory = { check_max( 12.GB * task.attempt, 'memory' ) }
ext.args = "--work-on-disk --max-db-size 14 --kmer-len 15 --minimizer-len 13 --jellyfish-bin \$(which jellyfish)"
jfy133 marked this conversation as resolved.
Show resolved Hide resolved
}
}
4 changes: 2 additions & 2 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@ params {
// TODO nf-core: Give any required params for the test so that command line flags are not needed
input = params.pipelines_testdata_base_path + 'viralrecon/samplesheet/samplesheet_full_illumina_amplicon.csv'

build_bracken = true
build_diamond = true
build_kaiju = true
build_malt = true
build_centrifuge = true
build_kraken2 = true
build_bracken = true

build_krakenuniq = true
}
4 changes: 2 additions & 2 deletions conf/test_nothing.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ params {

input = 'https://raw.githubusercontent.com/nf-core/test-datasets/createtaxdb/samplesheets/test.csv'

build_bracken = false
build_diamond = false
build_kaiju = false
build_malt = false
build_centrifuge = false
build_kraken2 = false
build_bracken = false

build_krakenuniq = false
}
19 changes: 19 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [DIAMOND](#diamond) - Database files for DIAMOND
- [Kaiju](#kaiju) - Database files for Kaiju
- [Kraken2](#kraken2) - Database files for Kraken2
- [KrakenUniq](#krakenuniq) - Database files for KrakenUniq
- [MALT](#malt) - Database files for MALT

### MultiQC
Expand Down Expand Up @@ -139,6 +140,24 @@ The `fmi` file can be given to Kaiju itself with `kaiju -f <your_database>.fmi`

The resulting `<db_name>/` directory can be given to Kraken2 itself with `kraken2 --db <your_database_name>` etc.

### KrakenUniq

[KrakenUniq](https://github.com/fbreitwieser/krakenuniq) Metagenomics classifier with unique k-mer counting for more specific results.

<details markdown="1">
<summary>Output files</summary>

- `kraken2/`
- `<db_name>/`
- `database-build.log`: KrakenUniq build process log
- `database.idx`: KrakenUniq index file
- `database.kdb`: KrakenUniq database file
- `taxDB`: KrakenUniq taxonomy information file

</details>

Note there may be additional files in this directory, however the ones listed above are the reportedly the required ones.

### MALT

[MALT](https://software-ab.cs.uni-tuebingen.de/download/malt) is a fast replacement for BLASTX, BLASTP and BLASTN, and provides both local and semi-global alignment capabilities.
Expand Down
6 changes: 6 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,12 @@
"git_sha": "ca87ad032a62f025f0c373facacef2df0c5411b2",
"installed_by": ["fasta_build_add_kraken2_bracken"]
},
"krakenuniq/build": {
"branch": "master",
"git_sha": "e3857325a14ef6e50e33c104c0a3be0ccaabbeb1",
"installed_by": ["modules"],
"patch": "modules/nf-core/krakenuniq/build/krakenuniq-build.diff"
},
"malt/build": {
"branch": "master",
"git_sha": "7d3bac628092d1aead36960c4b6ae41302a9f797",
Expand Down
7 changes: 7 additions & 0 deletions modules/nf-core/krakenuniq/build/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions modules/nf-core/krakenuniq/build/krakenuniq-build.diff

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

37 changes: 37 additions & 0 deletions modules/nf-core/krakenuniq/build/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

48 changes: 48 additions & 0 deletions modules/nf-core/krakenuniq/build/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -65,14 +65,15 @@ params {
malt_mapdb = null

// tool specific options
build_bracken = false
build_diamond = false
build_kaiju = false
build_malt = false
malt_sequencetype = "DNA"
build_centrifuge = false
build_kraken2 = false
kraken2_keepintermediate = false
build_bracken = false
build_krakenuniq = false
}

// Load base.config by default for all pipelines
Expand Down
13 changes: 9 additions & 4 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,12 @@
"description": "",
"default": "",
"properties": {
"build_bracken": {
"type": "boolean",
"fa_icon": "fas fa-save",
"description": "Turn on extending of Kraken2 database to include Bracken files. Requires nucleotide FASTA File input.",
"help_text": "Bracken2 databases are simply just a Kraken2 database with two additional files.\n\nNote however this requires a Kraken2 database _with_ intermediate files still in it, thus can result in large database directories."
},
"build_centrifuge": {
"type": "boolean",
"description": "Turn on building of Centrifuge database. Requires nucleotide FASTA file input.",
Expand Down Expand Up @@ -145,11 +151,10 @@
"fa_icon": "fas fa-save",
"description": "Retain intermediate Kraken2 build files for inspection."
},
"build_bracken": {
"build_krakenuniq": {
"type": "boolean",
"fa_icon": "fas fa-save",
"description": "Turn on extending of Kraken2 database to include Bracken files. Requires nucleotide FASTA File input.",
"help_text": "Bracken2 databases are simply just a Kraken2 database with two additional files.\n\nNote however this requires a Kraken2 database _with_ intermediate files still in it, thus can result in large database directories."
"fa_icon": "fas fa-toggle-on",
"description": "Turn on building of KrakenUniq database. Requires nucleotide FASTA file input."
}
},
"fa_icon": "fas fa-database"
Expand Down
5 changes: 4 additions & 1 deletion subworkflows/local/utils_nfcore_createtaxdb_pipeline/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,7 @@ def toolCitationText() {
params.build_diamond ? "DIAMOND (Buchfink et al. 2015)," : "",
params.build_kaiju ? "Kaiju (Menzel et al. 2016)," : "",
params.build_kraken2 ? "Kraken2 (Wood et al. 2019)," : "",
params.build_krakenuniq ? "KrakenUniq (Breitwieser et al. 2018)," : "",
params.build_malt ? "MALT (Vågene et al. 2018)," : "",
"and MultiQC (Ewels et al. 2016)",
"."
Expand All @@ -217,7 +218,9 @@ def toolBibliographyText() {
params.build_diamond ? "<li>Buchfink, B., Xie, C., & Huson, D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nature Methods, 12(1), 59–60. <a href=\"https://doi.org/10.1038/nmeth.3176\">10.1038/nmeth.3176</a></li>" : "",
params.build_kaiju ? "<li>Menzel, P., Ng, K. L., & Krogh, A. (2016). Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Communications, 7, 11257. <a href=\"https://doi.org/10.1038/ncomms11257\">10.1038/ncomms11257</a></li>" : "",
params.build_kraken2 ? "<li>Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. <a href=\"https://doi.org/10.1186/s13059-019-1891-0\">10.1186/s13059-019-1891-0</a></li>" : "",
params.build_malt ? "<li>Vågene, Å. J., Herbig, A., Campana, M. G., Robles García, N. M., Warinner, C., Sabin, S., Spyrou, M. A., Andrades Valtueña, A., Huson, D., Tuross, N., Bos, K. I., & Krause, J. (2018). Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico. Nature Ecology & Evolution, 2(3), 520–528. <a href=\"https://doi.org/10.1038/s41559-017-0446-6\">10.1038/s41559-017-0446-6</a></li>" : "", "<li>Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: /10.1093/bioinformatics/btw354</li>"
params.build_krakenuniq ? "<li>Breitwieser, F. P., Baker, D. N., & Salzberg, S. L. (2018). KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biology, 19(1), 198. <a href=\"https://doi.org/10.1186/s13059-018-1568-0\">10.1186/s13059-018-1568-0</a></li>" : "",
params.build_malt ? "<li>Vågene, Å. J., Herbig, A., Campana, M. G., Robles García, N. M., Warinner, C., Sabin, S., Spyrou, M. A., Andrades Valtueña, A., Huson, D., Tuross, N., Bos, K. I., & Krause, J. (2018). Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico. Nature Ecology & Evolution, 2(3), 520–528. <a href=\"https://doi.org/10.1038/s41559-017-0446-6\">10.1038/s41559-017-0446-6</a></li>" : "",
"<li>Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: /10.1093/bioinformatics/btw354</li>"
].join(' ').trim()

return reference_text
Expand Down
10 changes: 7 additions & 3 deletions tests/test.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,19 @@ nextflow_pipeline {
assertAll(
{ assert workflow.success },
{ assert snapshot(
file("$outputDir/bracken/database/database100mers.kmer_distrib").name,
file("$outputDir/bracken/database/database100mers.kraken").name,
file("$outputDir/bracken/database/database.kraken").name,
path("$outputDir/centrifuge/"),
path("$outputDir/diamond/database.dmnd"),
path("$outputDir/kaiju/database.fmi"),
path("$outputDir/kraken2/database/hash.k2d"),
file("$outputDir/kraken2/database/opts.k2d").name,
path("$outputDir/kraken2/database/taxo.k2d"),
file("$outputDir/bracken/database/database100mers.kmer_distrib").name,
file("$outputDir/bracken/database/database100mers.kraken").name,
file("$outputDir/bracken/database/database.kraken").name,
path("$outputDir/krakenuniq/database/database-build.log").readLines().last().contains('database.idx'),
file("$outputDir/krakenuniq/database/database.idx").name,
file("$outputDir/krakenuniq/database/database.kdb"),
file("$outputDir/krakenuniq/database/taxDB"),
path("$outputDir/malt/malt-build.log").readLines().last().contains('Peak memory'),
path("$outputDir/malt/malt_index/index0.idx"),
path("$outputDir/malt/malt_index/ref.db"),
Expand Down
14 changes: 9 additions & 5 deletions tests/test.nf.test.snap
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
{
"test_profile": {
"content": [
"database100mers.kmer_distrib",
"database100mers.kraken",
"database.kraken",
[
"database.1.cf:md5,1481615ab90b5573f6d9e57f97890178",
"database.2.cf:md5,d50fa66e215e80284314ff6521dcd4a4",
Expand All @@ -12,9 +15,10 @@
"hash.k2d:md5,01122a04dcef29ceb3baa68a9f6e6ef5",
"opts.k2d",
"taxo.k2d:md5,cd8170a8c5a1b763a9ac1ffa2107cc88",
"database100mers.kmer_distrib",
"database100mers.kraken",
"database.kraken",
true,
"database.idx",
"database.kdb:md5,a24fce43bedbc6c420f6e36d10c112a3",
"taxDB:md5,1aed1afa948daffc236deba1c5d635db",
true,
"index0.idx:md5,876139dc930e68992cd2625e08bba48a",
"ref.db:md5,377073f58a9f9b85acca59fcf21744a9",
Expand All @@ -26,8 +30,8 @@
],
"meta": {
"nf-test": "0.8.4",
"nextflow": "24.04.1"
"nextflow": "24.04.2"
},
"timestamp": "2024-05-23T08:15:27.641419595"
"timestamp": "2024-05-30T10:54:40.551963562"
}
}
Loading
Loading