MDL/metabolt is a bioinformatics pipeline that ...
-
Read QC (
FastQC
) -
Preprocessing (
fastp
) -
Assembly (
MEGAHIT
) -
Alignment (
BWA
)-
Indexing Generates index files from reference genomes to expedite the alignment process.
-
Mapping Aligns sequencing reads to the indexed reference genome.
-
-
SAMtools (
SAMtools
) Provides utilities for processing and managing SAM/BAM files.-
Sorting Organizes alignments by genomic coordinates to facilitate efficient data retrieval.
-
Indexing Creates index files for sorted BAM files, enabling rapid access to specific genomic regions.
-
-
Contigs Depth Calculation (
jgi_summarize_bam_contig_depth
) -
Binning (
MetaBAT2
) -
Present QC for Raw Reads (
MultiQC
)
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test
before running the workflow on actual data.
-
Samplesheet Preparation:
-
Prepare a samplesheet with your input data. Each row represents a sample, with columns specifying the sample name and the paths to the FASTQ files.
-
Example
samplesheet.csv
(for paired-end reads):sample,fastq_1,fastq_2 CONTROL,AEG588A1_S1_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz CONDITION,SRR123_S1_R1_011.fastq.gz,SRR123_S1_R2_011.fastq.gz
Each row represents a FASTQ file (single-end) or a pair of FASTQ files (paired-end).
-
-
Run the pipeline:
nextflow run muneebdev7/metabolt \ -profile <docker/singularity/conda/institute> \ --input samplesheet.csv \ --outdir <OUTDIR>
Warning
Please provide pipeline parameters via the CLI or Nextflow -params-file
option. Custom config files including those provided by the -c
Nextflow option can be used to provide any configuration except for parameters; see docs.
To see the results of an example test run with a full size dataset refer to the results
directory.
For more details about the output files and reports, please refer to the
output documentation
.
MDL/metabolt was written by Muhammad Muneeb Nasir at Metagenomics Discovery Lab (MDL) at SINES, NUST.
We thank the following people for their extensive assistance in the development of this pipeline:
If you would like to contribute to this pipeline, please see the contributing guidelines
.
For further information or help, don't hesitate to get in touch on email
.
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.