Skip to content

Commit

Permalink
Merge pull request #23 from CFIA-NCFAD/dev
Browse files Browse the repository at this point in the history
3.2.0
  • Loading branch information
peterk87 committed Jun 22, 2023
2 parents 8fdf40e + 5bc18fa commit 415e7ab
Show file tree
Hide file tree
Showing 24 changed files with 555 additions and 562 deletions.
5 changes: 2 additions & 3 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,10 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/CFIA-NCFAD/n

- [ ] This comment contains a description of changes (with reason).
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py`
- [ ] If you've added a new tool - ensure that you've added the version info to a `versions.yml` and added it to the `ch_versions` channel in the workflow you added the tool to.
- [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/CFIA-NCFAD/nf-flu/tree/master/.github/CONTRIBUTING.md)
- [ ] If necessary, also make a PR on [the CFIA-NCFAD/nf-test-datasets repo](https://github.com/CFIA-NCFAD/nf-test-datasets/pull/new)
- [ ] Make sure your code lints (`nf-core lint .`).
- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`).
- [ ] Ensure the test suite passes (`nextflow run . -profile test_{illumina,nanopore},docker`).
- [ ] Usage Documentation in `docs/usage.md` is updated.
- [ ] Output Documentation in `docs/output.md` is updated.
- [ ] `CHANGELOG.md` is updated.
Expand Down
44 changes: 44 additions & 0 deletions .github/workflows/branch.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: nf-core branch protection
# This workflow is triggered on PRs to master branch on the repository
# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev`
on:
pull_request_target:
branches: [master]

jobs:
test:
runs-on: ubuntu-latest
steps:
# PRs to the nf-core repo master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
- name: Check PRs
if: github.repository == 'CFIA-NCFAD'
run: |
{ [[ ${{github.event.pull_request.head.repo.full_name }} == CFIA-NCFAD ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]]
# If the above check failed, post a comment on the PR explaining the failure
# NOTE - this doesn't currently work if the PR is coming from a fork, due to limitations in GitHub actions secrets
- name: Post PR comment
if: failure()
uses: mshick/add-pr-comment@v1
with:
message: |
## This PR is against the `master` branch :x:
* Do not close this PR
* Click _Edit_ and change the `base` to `dev`
* This CI test will remain failed until you push a new commit
---
Hi @${{ github.event.pull_request.user.login }},
It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch.
The `master` branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.
Thanks again for your contribution!
repo-token: ${{ secrets.GITHUB_TOKEN }}
allow-repeats: false
10 changes: 7 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,18 @@ name: CI
on:
push:
branches:
- master
- dev
pull_request:
branches:
- '*'
release:
types: [published]

env:
NXF_ANSI_LOG: false

concurrency:
group: "${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}"
cancel-in-progress: true

jobs:
test_illumina:
name: Run Illumina test
Expand Down
27 changes: 0 additions & 27 deletions .github/workflows/linting_comment.yml

This file was deleted.

15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,21 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [[3.2.0](https://github.com/CFIA-NCFAD/nf-flu/releases/tag/3.2.0)] - 2023-06-22

### Added

* Influenza B virus support (#14)
* Polars for faster parsing of BLAST results (#14)

### Fixes

* Irregular Illumina paired-end FASTQ files not producing IRMA assemblies (#20)

### Updates

* Updated README.md to include references and citations

## [[3.1.6](https://github.com/CFIA-NCFAD/nf-flu/releases/tag/3.1.6)] - 2023-05-31

This is a patch release for a minor change to use Biocontainers Docker and Singularity images for Clair3 to avoid hitting limits on pulls from Docker Hub and since Biocontainers images are half the size of [hkubal/clair3](https://hub.docker.com/r/hkubal/clair3/) images.
Expand Down
121 changes: 110 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# CFIA-NCFAD/nf-flu - Influenza A Virus Genome Assembly Nextflow Workflow
# CFIA-NCFAD/nf-flu - Influenza A and B Virus Genome Assembly Nextflow Workflow

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7011213.svg)](https://doi.org/10.5281/zenodo.7011213)

[![CI](https://github.com/CFIA-NCFAD/nf-flu/actions/workflows/ci.yml/badge.svg)](https://github.com/CFIA-NCFAD/nf-flu/actions/workflows/ci.yml)

Expand All @@ -9,19 +11,18 @@

## Introduction

**nf-flu** is a bioinformatics analysis pipeline for assembly and H/N subtyping of Influenza A virus. The pipeline supports both Illumina and Nanopore Platform.
Since Influenza is a special virus with multiple gene segments (8 segments) and there might be a reference or multiple we would want to align against, the pipeline will automatically pull top match references for each segment.
To achieve this task, the pipeline downloads Influenza database from NCBI and user could provide their own reference database. The pipline performs read mapping against each reference segment, variant calling and genome assembly.

The pipeline is implemented in [Nextflow][]
**nf-flu** is a [Nextflow][] bioinformatics analysis pipeline for assembly and H/N subtyping of Influenza A and B viruses from Illumina or Nanopore sequencing data.
Since Influenza has a segmented genome consisting of 8 gene segments, the pipeline will automatically select the top matching reference sequence from NCBI for each gene segment based on [IRMA][] assembly and nucleotide [BLAST][] against all Influenza sequences from NCBI.
Users can also provide their own reference sequences to include in the top reference sequence selection process.
After reference sequence selection, the pipeline performs read mapping to each reference sequence, variant calling and depth-masked consensus sequence generation.

## Pipeline summary

1. Download latest [NCBI Influenza DB][] sequences and metadata (or use user-specified files)
2. Merge reads of re-sequenced samples ([`cat`](http://www.linfo.org/cat.html)) (if needed)
3. Assembly of Influenza gene segments with [IRMA][] using the built-in FLU module
4. Nucleotide [BLAST][] search against [NCBI Influenza DB][]
5. Automatically pull top match references for segments
5. Automatically select top match references for segments
6. H/N subtype prediction and Excel XLSX report generation based on BLAST results
7. Perform Variant calling and genome assembly for all segments.

Expand Down Expand Up @@ -73,11 +74,99 @@ The nf-flu pipeline comes with:
* [Usage](docs/usage.md) and
* [Output](docs/output.md) documentation.

## Resources
## Resources and References

### [BcfTools][] and [Samtools][]

```text
Danecek, P., Bonfield, J.K., Liddle, J., Marshall, J., Ohan, V., Pollard, M.O., Whitwham, A., Keane, T., McCarthy, S.A., Davies, R.M., Li, H., 2021. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008. https://doi.org/10.1093/gigascience/giab008
```

### [BLAST][] Basic Local Alignment Search Tool

```text
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403–410. https://doi.org/10.1016/S0022-2836(05)80360-2
```

```text
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T.L., 2009. BLAST+: architecture and applications. BMC Bioinformatics 10, 421. https://doi.org/10.1186/1471-2105-10-421
```

### [Clair3][]

```text
Zheng, Z., Li, S., Su, J., Leung, A.W.-S., Lam, T.-W., Luo, R., 2022. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat Comput Sci 2, 797–803. https://doi.org/10.1038/s43588-022-00387-x
```

### [IRMA][] Iterative Refinement Meta-Assembler

```text
Shepard, S.S., Meno, S., Bahl, J., Wilson, M.M., Barnes, J., Neuhaus, E., 2016. Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler. BMC Genomics 17, 708. https://doi.org/10.1186/s12864-016-3030-6
```

### [Medaka][]

[Medaka][] is deprecated in favour of [Clair3][] for variant calling of Nanopore data.

### [Minimap2][]

[Minimap2][] is used for rapid and accurate read alignment to reference sequences.

```text
Li, H., 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100. https://doi.org/10.1093/bioinformatics/bty191
```

### [Mosdepth][]

[Mosdepth][] is used for rapid sequencing coverage calculation and summary statistics.

```text
Pedersen, B.S., Quinlan, A.R., 2017. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868. https://doi.org/10.1093/bioinformatics/btx699
```

### [MultiQC][]

[MultiQC][] is used for generation of a single report for multiple tools.

```text
Ewels, P., Magnusson, M., Lundin, S., Käller, M., 2016. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048. https://doi.org/10.1093/bioinformatics/btw354
```
### [NCBI Influenza Virus Resource][]
**nf-flu** relies on publicly available Influenza sequence data from NCBI available at the [NCBI Influenza Virus Resource][], which is downloaded from the [FTP site](https://ftp.ncbi.nih.gov/genomes/INFLUENZA/).
NCBI Influenza Virus Resource:
```text
Bao, Y., Bolotov, P., Dernovoy, D., Kiryutin, B., Zaslavsky, L., Tatusova, T., Ostell, J., Lipman, D., 2008. The influenza virus resource at the National Center for Biotechnology Information. J Virol 82, 596–601. https://doi.org/10.1128/JVI.02005-07
```
NCBI Influenza Virus Sequence Annotation Tool:
```text
Bao, Y., Bolotov, P., Dernovoy, D., Kiryutin, B., Tatusova, T., 2007. FLAN: a web server for influenza virus genome annotation. Nucleic Acids Res 35, W280-284. https://doi.org/10.1093/nar/gkm354
```
### [Nextflow][]
**nf-flu** is implemented in [Nextflow][].
```text
Tommaso, P.D., Chatzou, M., Floden, E.W., Barja, P.P., Palumbo, E., Notredame, C., 2017. Nextflow enables reproducible computational workflows. Nat Biotechnol 35, 316–319. https://doi.org/10.1038/nbt.3820
```
### [nf-core][]
[nf-core][] is a great resource for building robust and reproducible bioinformatics pipelines.
```text
Ewels, P.A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M.U., Di Tommaso, P., Nahnsen, S., 2020. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol 38, 276–278. https://doi.org/10.1038/s41587-020-0439-x
```
### [seqtk][]
* [NCBI Influenza FTP site](https://ftp.ncbi.nih.gov/genomes/INFLUENZA/)
* [IRMA][] Iterative Refinement Meta-Assembler
* [IRMA Publication](https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-3030-6)
[seqtk][] is used for rapid manipulation of FASTA/Q files. Available from GitHub at [lh3/seqtk](https://github.com/lh3/seqtk)
## Credits
Expand All @@ -94,3 +183,13 @@ The nf-flu pipeline was originally developed by [Peter Kruczkiewicz](https://git
[Nextflow]: https://www.nextflow.io/
[Docker]: https://www.docker.com/
[Singularity]: https://www.sylabs.io/guides/3.0/user-guide/quick_start.html#quick-installation-steps
[NCBI Influenza Virus Resource]: https://www.ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi?go=database
[BcfTools]: https://samtools.github.io/bcftools/
[Samtools]: https://www.htslib.org/
[nf-core]: https://nf-co.re/
[Minimap2]: https://github.com/lh3/minimap2/
[Clair3]: https://github.com/HKU-BAL/Clair3
[Medaka]: https://github.com/nanoporetech/medaka
[Mosdepth]: https://github.com/brentp/mosdepth
[seqtk]: https://github.com/lh3/seqtk
[MultiQC]: https://multiqc.info/
Loading

0 comments on commit 415e7ab

Please sign in to comment.