Add first two modules (diamond and kaiju), missing docs and tests #14

jfy133 · 2023-12-05T20:51:22Z

This is more of a PoC to draft rough structure. Subject to change during development.

~~Note input validation not working correctly, as only need to require one of either fasta_dna or fasta_aa, or both. But if neither supplied then it just runs with empty lists :(~~ now working thanks to @mirpedrol :D

Adds:

Input schema sheet draft
DIAMOND
Kaiju

database building and nf-tests :)

PR checklist

nextflow_schema.json

github-actions · 2023-12-14T11:36:12Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit a317cbe

+| ✅ 160 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗  21 tests had warnings |!

❗ Test warnings:

readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
pipeline_todos - TODO string in nextflow.config: Specify your pipeline's command line flags
pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file
pipeline_todos - TODO string in README.md: TODO nf-core:
pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
pipeline_todos - TODO string in README.md: Fill in short bullet-pointed list of the default steps in the pipeline
pipeline_todos - TODO string in README.md: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
pipeline_todos - TODO string in README.md: update the following command to include all required parameters for a minimal example
pipeline_todos - TODO string in README.md: If applicable, make list of people who have also contributed
pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
pipeline_todos - TODO string in README.md: Add bibliography of tools and data used in your pipeline
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
pipeline_todos - TODO string in usage.md: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website.
pipeline_todos - TODO string in output.md: Write this documentation describing your workflow's output
pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
pipeline_todos - TODO string in WorkflowCreatetaxdb.groovy: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in WorkflowMain.groovy: Add Zenodo DOI for pipeline after first release
pipeline_todos - TODO string in base.config: Check the defaults for all processes
pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed

❔ Tests ignored:

actions_ci - actions_ci

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-createtaxdb_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-createtaxdb_logo_light.png
files_exist - File found: docs/images/nf-core-createtaxdb_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: lib/nfcore_external_java_deps.jar
files_exist - File found: lib/NfcoreTemplate.groovy
files_exist - File found: lib/Utils.groovy
files_exist - File found: lib/WorkflowMain.groovy
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: lib/WorkflowCreatetaxdb.groovy
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-createtaxdb_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 1.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-createtaxdb_logo_light.png matches the template
files_unchanged - docs/images/nf-core-createtaxdb_logo_light.png matches the template
files_unchanged - docs/images/nf-core-createtaxdb_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - lib/nfcore_external_java_deps.jar matches the template
files_unchanged - lib/NfcoreTemplate.groovy matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
files_unchanged - pyproject.toml matches the template
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (104 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: release-announcments.yml
actions_schema_validation - Workflow validation passed: awstest.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - 'assets/multiqc_config.yml' contains report_section_order
multiqc_config - 'assets/multiqc_config.yml' contains export_plots
multiqc_config - 'assets/multiqc_config.yml' contains report_comment
multiqc_config - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins.
multiqc_config - 'assets/multiqc_config.yml' contains a matching 'report_comment'.
multiqc_config - 'assets/multiqc_config.yml' contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.11.1
Run at 2024-01-05 16:36:02

…ges (I think)

conf/modules.config

modules.json

Joon-Klaps

Excited to have this pipeline, so happy to contribute!
Small suggestions you might have missed

assets/test.csv

docs/output.md

Joon-Klaps · 2023-12-20T08:34:21Z

docs/output.md


-![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png)
+The `dmnd` file can be given to one of the DIAMOND alignment commands with `diamond blast<x/p> -d <your_database>.dmnd` etc.


Wouldn't it be a cool if we could extract (dynamically) the nf-core pipelines from nf-co.re that require or use the databases of this module?

Not sure I exactly follow, but we already see this: on the modules page. For example if you search for DIAMOND here:

https://nf-co.re/modules

you see

That taxprofiler is using the DIAMOND_BLASTX module.

Yes thats exactly what I mean but then have it at in the readme of description of the pipeline.
Output of Kraken can be used in: taxprofiler, MAG, viralrecon, ...

tests/test.nf.test

modules.json

workflows/createtaxdb.nf

Midnighter

I know this is very early work. I'm fine with merging as is but I had two concerns that I propose to change in the long run:

There are two separate input options for nodesdmp and namesdmp. I would expect a path/tar of a directory with those files (possibly containing more of the dump files).
I would create one subworkflow per tool that the main createtaxdb workflow calls to.

jfy133 · 2024-01-02T12:47:34Z

I know this is very early work. I'm fine with merging as is but I had two concerns that I propose to change in the long run:

Do you mind if you give me an approval then I can merge this in and follow up depending on feedback of my status below? Then can start doing more parallel PRs to add the below

There are two separate input options for nodesdmp and namesdmp. I would expect a path/tar of a directory with those files (possibly containing more of the dump files).

Is expectation this based purely on ncbi taxdump files? I currently had kept it separate because if people want to make custom databases that may include customised dump files whereby I don't see why one would necessarily re-tar...

That said it is reasonable to expect someone may want to do that... I may consider adding it as an option (if one gives the taxdump as tar it'll auto extract - but I vaguely remember plucking specific files from a directory is not directly trivial with Nxf).

But I would make this a separate issue as a separate functionality (including maybe auto downloading taxdump files, but I'm not sure yet how to do this properly e.g. with gtdbtk taxdump stuff etc)

I would create one subworkflow per tool that the main createtaxdb workflow calls to.

Yes that's my plan for multi-module database construction commands (e.g. kraken2), or do you have a motivation to do this for single build modules too?

Midnighter · 2024-01-02T15:23:40Z

Do you mind if you give me an approval then I can merge this in and follow up depending on feedback of my status below?

I didn't approve due to the open comments. Do you plan to address them or see them as irrelevant?

Is expectation this based purely on ncbi taxdump files? I currently had kept it separate because if people want to make custom databases that may include customised dump files whereby I don't see why one would necessarily re-tar...

That said it is reasonable to expect someone may want to do that... I may consider adding it as an option (if one gives the taxdump as tar it'll auto extract - but I vaguely remember plucking specific files from a directory is not directly trivial with Nxf).

My expectation is based on taxonkit usage, yes. With a custom taxonomy I would just pass a path to a directory and then expect {dir}/nodes.dmp and {dir}/names.dmp to exist.

Yes that's my plan for multi-module database construction commands (e.g. kraken2), or do you have a motivation to do this for single build modules too?

No, I don't see a reason for single modules. I didn't read the file properly and somehow thought the input channel transformation was Kaiju-specific.

jfy133 · 2024-01-02T16:36:52Z

Do you mind if you give me an approval then I can merge this in and follow up depending on feedback of my status below?

I didn't approve due to the open comments. Do you plan to address them or see them as irrelevant?

Hm, I thought I had addressed them 🤔, but I see now there is no commit. Maybe I didn't push...

Is expectation this based purely on ncbi taxdump files? I currently had kept it separate because if people want to make custom databases that may include customised dump files whereby I don't see why one would necessarily re-tar...

That said it is reasonable to expect someone may want to do that... I may consider adding it as an option (if one gives the taxdump as tar it'll auto extract - but I vaguely remember plucking specific files from a directory is not directly trivial with Nxf).

My expectation is based on taxonkit usage, yes. With a custom taxonomy I would just pass a path to a directory and then expect {dir}/nodes.dmp and {dir}/names.dmp to exist.

Hrm, ok. I'll maybe look more in detail to taxonkit and try and use that as a structure to follow. But I think I would still do that as a follow up PR (because it should be quite straightforward to just change how those files are picked up and passed to the module).

Yes that's my plan for multi-module database construction commands (e.g. kraken2), or do you have a motivation to do this for single build modules too?

No, I don't see a reason for single modules. I didn't read the file properly and somehow thought the input channel transformation was Kaiju-specific.

Ah ok! Then I leave that bit as is for now at least.

Midnighter

Provided comments are addressed, this looks like a good start.

Co-authored-by: Joon Klaps <61584065+Joon-Klaps@users.noreply.github.com>

…o input-validation

jfy133 · 2024-01-05T16:38:19Z

OK thank you for the reviews @maxulysse @Joon-Klaps @Midnighter !

I've addressed all the suggestions now (except for the larger one from @Midnighter regarding taxdump, but I'll do that as a follow up), so I will merge so others can start to get involved, and other unaddressed changes can be dealt with in a follow up :)

jfy133 added 2 commits December 5, 2023 21:49

Add first two modules (diamond and kaiju), missing docs and tests

a195ec9

Add DIAMOND output description and update CITATIONS.md

4c4de25

mashehu reviewed Dec 5, 2023

View reviewed changes

nextflow_schema.json Outdated Show resolved Hide resolved

nextflow_schema.json Outdated Show resolved Hide resolved

This was linked to issues Dec 9, 2023

Add Kaiju database build #4

Closed

Add DIAMOND database build #3

Closed

jfy133 added 7 commits December 14, 2023 11:01

Fix validation to require at least either DNA or AA Fasta

7e77a29

Add test config

c060918

Minor formatting

93f4651

Start adding nf-test

fc076b9

Start adding basic nf-test structrue

9e0cf90

Add pipeline-level CI testing with nf-test in GHA

08f0c1f

Update CHANGELOG

b5df7fa

jfy133 added 7 commits December 14, 2023 12:44

Ignore actions-ci due to version specificaiton change for nf-test

383346d

Fix nf-core linting

9678df6

Ignore actions_ci lint

046e1da

Bump minimum version

cc6c2d7

Move software versions out to just exists because of NXF version chan…

e8fb47b

…ges (I think)

Fix linting (back to tools 2.10 template)

d2d0702

Add citation text.

b08d719

jfy133 requested a review from mashehu December 14, 2023 12:14

LilyAnderssonLee self-requested a review December 18, 2023 09:57

maxulysse reviewed Dec 20, 2023

View reviewed changes

conf/modules.config Show resolved Hide resolved

maxulysse reviewed Dec 20, 2023

View reviewed changes

modules.json Outdated Show resolved Hide resolved

Joon-Klaps reviewed Dec 20, 2023

View reviewed changes

Midnighter reviewed Jan 2, 2024

View reviewed changes

Midnighter approved these changes Jan 2, 2024

View reviewed changes

jfy133 and others added 5 commits January 5, 2024 17:02

Merge branch 'dev' into input-validation

4716bde

Apply suggestions from code review

1a0e5ba

Co-authored-by: Joon Klaps <61584065+Joon-Klaps@users.noreply.github.com>

Merge branch 'input-validation' of github.com:nf-core/createtaxdb int…

75db24b

…o input-validation

First set of changes after review

a59f2a2

Standardise test output dir variable

ad9a61b

jfy133 mentioned this pull request Jan 5, 2024

taxdump related files should be supplied as a directory not per file #19

Open

prettier

a317cbe

jfy133 merged commit 467459b into dev Jan 5, 2024
7 checks passed

jfy133 deleted the input-validation branch January 5, 2024 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add first two modules (diamond and kaiju), missing docs and tests #14

Add first two modules (diamond and kaiju), missing docs and tests #14

jfy133 commented Dec 5, 2023 •

edited

Loading

github-actions bot commented Dec 14, 2023 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

Joon-Klaps left a comment •

edited

Loading

Joon-Klaps Dec 20, 2023

jfy133 Jan 5, 2024

Joon-Klaps Jan 6, 2024

Midnighter left a comment

jfy133 commented Jan 2, 2024

Midnighter commented Jan 2, 2024

jfy133 commented Jan 2, 2024

Midnighter left a comment

jfy133 commented Jan 5, 2024


		![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png)
		The `dmnd` file can be given to one of the DIAMOND alignment commands with `diamond blast<x/p> -d <your_database>.dmnd` etc.

Add first two modules (diamond and kaiju), missing docs and tests #14

Add first two modules (diamond and kaiju), missing docs and tests #14

Conversation

jfy133 commented Dec 5, 2023 • edited Loading

PR checklist

github-actions bot commented Dec 14, 2023 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

Joon-Klaps left a comment • edited Loading

Choose a reason for hiding this comment

Joon-Klaps Dec 20, 2023

Choose a reason for hiding this comment

jfy133 Jan 5, 2024

Choose a reason for hiding this comment

Joon-Klaps Jan 6, 2024

Choose a reason for hiding this comment

Midnighter left a comment

Choose a reason for hiding this comment

jfy133 commented Jan 2, 2024

Midnighter commented Jan 2, 2024

jfy133 commented Jan 2, 2024

Midnighter left a comment

Choose a reason for hiding this comment

jfy133 commented Jan 5, 2024

jfy133 commented Dec 5, 2023 •

edited

Loading

github-actions bot commented Dec 14, 2023 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️

Joon-Klaps left a comment •

edited

Loading