-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add first two modules (diamond and kaiju), missing docs and tests #14
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited to have this pipeline, so happy to contribute!
Small suggestions you might have missed
|
||
![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png) | ||
The `dmnd` file can be given to one of the DIAMOND alignment commands with `diamond blast<x/p> -d <your_database>.dmnd` etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be a cool if we could extract (dynamically) the nf-core pipelines from nf-co.re that require or use the databases of this module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I exactly follow, but we already see this: on the modules page. For example if you search for DIAMOND here:
you see
That taxprofiler is using the DIAMOND_BLASTX module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thats exactly what I mean but then have it at in the readme of description of the pipeline.
Output of Kraken can be used in: taxprofiler, MAG, viralrecon, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is very early work. I'm fine with merging as is but I had two concerns that I propose to change in the long run:
- There are two separate input options for
nodesdmp
andnamesdmp
. I would expect a path/tar of a directory with those files (possibly containing more of the dump files). - I would create one subworkflow per tool that the main createtaxdb workflow calls to.
Do you mind if you give me an approval then I can merge this in and follow up depending on feedback of my status below? Then can start doing more parallel PRs to add the below
Is expectation this based purely on ncbi taxdump files? I currently had kept it separate because if people want to make custom databases that may include customised dump files whereby I don't see why one would necessarily re-tar... That said it is reasonable to expect someone may want to do that... I may consider adding it as an option (if one gives the taxdump as tar it'll auto extract - but I vaguely remember plucking specific files from a directory is not directly trivial with Nxf). But I would make this a separate issue as a separate functionality (including maybe auto downloading taxdump files, but I'm not sure yet how to do this properly e.g. with gtdbtk taxdump stuff etc)
Yes that's my plan for multi-module database construction commands (e.g. kraken2), or do you have a motivation to do this for single build modules too? |
I didn't approve due to the open comments. Do you plan to address them or see them as irrelevant?
My expectation is based on taxonkit usage, yes. With a custom taxonomy I would just pass a path to a directory and then expect
No, I don't see a reason for single modules. I didn't read the file properly and somehow thought the input channel transformation was Kaiju-specific. |
Hm, I thought I had addressed them 🤔, but I see now there is no commit. Maybe I didn't push...
Hrm, ok. I'll maybe look more in detail to taxonkit and try and use that as a structure to follow. But I think I would still do that as a follow up PR (because it should be quite straightforward to just change how those files are picked up and passed to the module).
Ah ok! Then I leave that bit as is for now at least. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided comments are addressed, this looks like a good start.
Co-authored-by: Joon Klaps <61584065+Joon-Klaps@users.noreply.github.com>
…o input-validation
OK thank you for the reviews @maxulysse @Joon-Klaps @Midnighter ! I've addressed all the suggestions now (except for the larger one from @Midnighter regarding taxdump, but I'll do that as a follow up), so I will merge so others can start to get involved, and other unaddressed changes can be dealt with in a follow up :) |
This is more of a PoC to draft rough structure. Subject to change during development.
Note input validation not working correctly, as only need to require one of either fasta_dna or fasta_aa, or both. But if neither supplied then it just runs with empty lists :(now working thanks to @mirpedrol :DAdds:
database building and nf-tests :)
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).