Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first two modules (diamond and kaiju), missing docs and tests #14

Merged
merged 22 commits into from
Jan 5, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,11 @@
- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

> Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

- [DIAMOND](https://doi.org/10.1038/nmeth.3176)

> Buchfink, B., Xie, C., & Huson, D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nature Methods, 12(1), 59–60. https://doi.org/10.1038/nmeth.3176

- [Kaiju](https://doi.org/10.1038/ncomms11257)

> Menzel, P., Ng, K. L., & Krogh, A. (2016). Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Communications, 7, 11257. https://doi.org/10.1038/ncomms11257
52 changes: 41 additions & 11 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,30 +7,60 @@
"items": {
"type": "object",
"properties": {
"sample": {
"id": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Sample name must be provided and cannot contain spaces"
"unique": true,
"errorMessage": "Sequence reference name must be provided and cannot contain spaces",
"meta": ["id"],
"anyOf": [
{
"dependentRequired": ["fasta_dna"]
},
{
"dependentRequired": ["fasta_aa"]
}
]
},
"fastq_1": {
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
"taxid": {
"type": "integer",
"unique": true,
"errorMessage": "Please provide a valid taxonomic ID in integer format",
"meta": ["taxid"]
},
"fastq_2": {
"errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
"fasta_dna": {
"anyOf": [
{
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz$"
"pattern": "^\\S+\\.(fasta|fas|fa|fna)(\\.gz)?$"
},
{
"type": "string",
"maxLength": 0
}
]
],
"unique": true,
"errorMessage": "FASTA file for nucleotide sequence cannot contain spaces and must have a valid FASTA extension (fasta, fna, fa, fas, faa), optionally gzipped",
"exists": true,
"format": "file-path"
},
"fasta_aa": {
"anyOf": [
{
"type": "string",
"pattern": "^\\S+\\.(fasta|fas|fa|faa)(\\.gz)?$"
},
{
"type": "string",
"maxLength": 0
}
],
"unique": true,
"errorMessage": "FASTA file for amino acid reference sequence cannot contain spaces and must have a valid FASTA extension (fasta, fna, fa, fas, faa), optionally gzipped",
"exists": true,
"format": "file-path"
}
},
"required": ["sample", "fastq_1"]
"required": ["id", "taxid"]
}
}
3 changes: 3 additions & 0 deletions assets/test.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
id,taxid,fasta_dna,fasta_aa
Severe_acute_respiratory_syndrome_coronavirus_2,2697049,/home/james/Downloads/createtaxdb/sarscov2.fasta,/home/james/Downloads/createtaxdb/sarscov2.faa
Haemophilus_influenzae,727,/home/james/Downloads/createtaxdb/haemophilus_infuenzae.fna.gz,
jfy133 marked this conversation as resolved.
Show resolved Hide resolved
12 changes: 0 additions & 12 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,18 +18,6 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

withName: SAMPLESHEET_CHECK {
publishDir = [
path: { "${params.outdir}/pipeline_info" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: FASTQC {
ext.args = '--quiet'
}
jfy133 marked this conversation as resolved.
Show resolved Hide resolved

withName: CUSTOM_DUMPSOFTWAREVERSIONS {
publishDir = [
path: { "${params.outdir}/pipeline_info" },
Expand Down
28 changes: 16 additions & 12 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,32 +12,36 @@ The directories listed below will be created in the results directory after the

The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:

- [FastQC](#fastqc) - Raw read QC
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

### FastQC
### Kaiju
jfy133 marked this conversation as resolved.
Show resolved Hide resolved

<details markdown="1">
<summary>Output files</summary>

- `fastqc/`
- `*_fastqc.html`: FastQC report containing quality metrics.
- `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images.
- `diamond/`
- `<database>.dmnd`: DIAMOND dmnd database file

</details>

[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
[DIAMOND](https://github.com/bbuchfink/diamond) is a accelerated BLAST compatible local sequence aligner particularly used for protein alignment.

![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png)
The `dmnd` file can be given to one of the DIAMOND alignment commands with `diamond blast<x/p> -d <your_database>.dmnd` etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be a cool if we could extract (dynamically) the nf-core pipelines from nf-co.re that require or use the databases of this module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I exactly follow, but we already see this: on the modules page. For example if you search for DIAMOND here:

https://nf-co.re/modules

you see

image

That taxprofiler is using the DIAMOND_BLASTX module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thats exactly what I mean but then have it at in the readme of description of the pipeline.
Output of Kraken can be used in: taxprofiler, MAG, viralrecon, ...


![MultiQC - FastQC mean quality scores plot](images/mqc_fastqc_quality.png)
### Kaiju

![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png)
<details markdown="1">
<summary>Output files</summary>

- `kaiju/`
- `<database_name>.fmi`: Kaiju FMI index file

</details>

[Kaiju](https://bioinformatics-centre.github.io/kaiju/) is a fast and sensitive taxonomic classification for metagenomics utilising nucletoide to protein translations.

:::note
The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality.
:::
The `fmi` file can be given to Kaiju itself with `kaiju -f <your_database>.fmi` etc.

### MultiQC

Expand Down
20 changes: 20 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,36 @@
"https://github.com/nf-core/modules.git": {
"modules": {
"nf-core": {
"cat/cat": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"]
},
"custom/dumpsoftwareversions": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
},
"diamond/makedb": {
"branch": "master",
"git_sha": "b29f6beb86d1d24d680277fb1a3f4de7b8b8a92c",
"installed_by": ["modules"]
},
"fastqc": {
"branch": "master",
"git_sha": "bd8092b67b5103bdd52e300f75889442275c3117",
"installed_by": ["modules"]
},
jfy133 marked this conversation as resolved.
Show resolved Hide resolved
"kaiju/mkfmi": {
"branch": "master",
"git_sha": "7365564c402cbd01e9407810730efd10039997a3",
"installed_by": ["modules"]
},
"malt/build": {
jfy133 marked this conversation as resolved.
Show resolved Hide resolved
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"]
},
"multiqc": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
Expand Down
7 changes: 7 additions & 0 deletions modules/nf-core/cat/cat/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

62 changes: 62 additions & 0 deletions modules/nf-core/cat/cat/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

36 changes: 36 additions & 0 deletions modules/nf-core/cat/cat/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading