Workflow Quality Control

A post sequencing QC tool for Oxford Nanopore sequencers

Introduction

This workflow is dedicated to the QC analyses of Oxford Nanopore runs, and it is adapted to RNA-Seq along with DNA-Seq.

This QC tool supports only Guppy and Dorado basecaller output : sequencing_summary.txt and sequencing_telemetry.js files. Flow cells and kits version are retrieved using the telemetry file, but a a single FAST5 file can be used if a telemetry file is not provided. If the sequencing summary file is not available, ToulligQC can also accept FAST5, FASTQ or BAM files (but it will significantly increase the running time)

ToulligQC can take barcoding samples by adding the barcode list as an option. ToulligQC deals with different file formats: gz, tar.gz, bz2 and tar.bz2. This tool will produce a set of graphs, statistic file in plain text format and a HTML report.

Compute requirements

Minimum requirements:

CPUs = 1
Memory = 4GB

Approximate run time: Approximately 5 minutes for 10M reads with the minimum requirements.

Install and run

These are instructions to install and run the workflow on command line. You can also access the workflow via the EPI2ME Desktop application.

The workflow uses Nextflow to manage compute and software resources, therefore Nextflow will need to be installed before attempting to run the workflow.

The workflow can currently be run using either [Docker](https://www.docker.com/products/docker-desktop or Singularity to provide isolation of the required software. Both methods are automated out-of-the-box provided either Docker or Singularity is installed. This is controlled by the -profile parameter as exemplified below.

It is not required to clone or download the git repository in order to run the workflow. More information on running EPI2ME workflows can be found on our website.

The following command can be used to obtain the workflow. This will pull the repository in to the assets folder of Nextflow and provide a list of all parameters available for the workflow as well as an example command:

nextflow run genomiqueens/wf-toulligqc --help

To update a workflow to the latest version on the command line use the following command:

nextflow pull genomiqueens/wf-toulligqc

A demo dataset is provided for testing of the workflow. It can be downloaded and unpacked using the following commands:

wget https://github.com/GenomiqueENS/wf-toulligqc/raw/main/demo_data/wf-toulligqc-demo.tar.gz
tar -xzvf wf-toulligqc-demo.tar.gz

The workflow can then be run with the downloaded demo data using:

nextflow run genomiqueens/wf-toulligqc \
    --input_files 'sequencing_summary + telemetry_source' \
    --sequencing_summary_source 'demo_data/sequencing_summary.txt' \
    --telemetry_source 'demo_data/sequencing_telemetry.js'

For further information about running a workflow on the command line see https://labs.epi2me.io/wfquickstart/

Related protocols

This workflow is designed to take input sequences that have been produced from Oxford Nanopore Technologies devices.

Find related protocols in the Nanopore community.

Inputs

Input Options

Nextflow parameter name	Type	Description	Help	Default
input_files	string	Select what type/ combination of input files to be used for the analysis	Workflow can be run with only the Guppy/ Dorado basecaller output file sequencing_summary.txt, or with the additional sequencing_telemetry.js. It can also be run with only FASTQ or BAM or FAST5 files.	sequencing_summary.txt only

Sample Options

Nextflow parameter name	Type	Description	Help
sequencing_summary_source	string	Basecaller sequencing summary source, can be compressed with gzip (.gz) or bzip2 (.bz2)
telemetry_source	string	Basecaller telemetry file source, can be compressed with gzip (.gz) or bzip2 (.bz2)
fast5	string	Fast5 file source, can also be in a tar.gz/tar.bz2 archive or a directory	Necessary if no telemetry file
fastq	string	FASTQ files to use in the analysis, can also be in a .gz archive	Necessary if no sequencing summary file
bam	string	BAM or SAM files to use in the analysis, can also be a SAM format.	Necessary if no sequencing summary file

Barcoding Options

Nextflow parameter name	Type	Description	Help	Default
barcoding	boolean	BAM or SAM files to use in the analysis.		False
barcodes	string	Coma separated barcode list (e.g. BC05,RB09,NB01,barcode10)	ToulligQC handle the following naming schemes: BCXX, RBXX, NBXX and barcodeXX where XX is the number of the barcode
barcoding_summary_pass	string	Basecaller barcoding summary source of passed reads, can be compressed with gzip (.gz) or bzip2 (.bz2).
barcoding_summary_fail	string	Basecaller barcoding summary source of passed reads, can be compressed with gzip (.gz) or bzip2 (.bz2).

Advanced Options

Nextflow parameter name	Type	Description	Help	Default
report_name	string	Name to give to report
disable_ping	boolean	Enable to prevent sending a workflow ping.		False

Outputs

Title	File path	Description	Per sample or aggregated
workflow report	./wf-template-report.html	Report for all samples.	aggregated

Useful links

toulligqc
nextflow
docker
singularity

See the EPI2ME website for lots of other resources and blog posts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Workflow Quality Control

Introduction

Compute requirements

Install and run

Related protocols

Inputs

Input Options

Sample Options

Barcoding Options

Advanced Options

Outputs

Useful links

Files

README.md

Latest commit

History

README.md

File metadata and controls

Workflow Quality Control

Introduction

Compute requirements

Install and run

Related protocols

Inputs

Input Options

Sample Options

Barcoding Options

Advanced Options

Outputs

Useful links