LaMeta

UNDER CONSTRUCTION

THE README IS OUTDATED!

Overview

This pipeline takes metagenomic (paired-end) short-read as input, as generated by Illumina sequencing. From this data, the pipeline aims to assemble high-quality single genomes.

All samples are separately quality controlled to remove Illumina library adaptors, low quality sequences and sequence ends, and possible host-genetic (human) and spike in contamination (PhiX). The cleaned reads are used for the subsequently following steps.
For all samples a separate metagenomic assembly is performed using Spades in metagenomic mode. The sequences are then mapped back to the resulting scaffolds and binned using the MaxBin2 software.
Additionally, a co-assembly with Megahit is performed. Using a groupfile it is possible to split samples into separate groups for this co-assembly. Again the resulting contigs are used as reference for backmapping, followed by two separate binning approaches using MaxBin2 and Metabat2, which for this approach now can also incorporate across-sample abundance differences for the binning procedure.
The resulting bins from the single-sample and subgroup co-assembly approaches are finally dereplicated using the dRep package, to achieve the highest-possible quality of single-genome bins combined with low redundancy.
All samples are again mapped to the final resulting bins to estimate bin abundance.

Executing the pipeline

To execute the pipeline with all default settings, do:

nextflow -c nextflow.config run main.nf --folder /path/to/folder

Please note: the output will be written where the pipeline is executed, NOT where the input files are located.

Optional parameters

The pipeline uses several parameters to fine-tune the various pipeline stages. Some of these can be modified during pipeline execution:

Reading the output

Dependencies and versions

BBMap (v.37.88): QC, Mapping to contigs.
Megahit (v1.1.2): Groupwise Co-Assemblies.
Spades (v.3.9.0): Single-Sample Assemblies.
Samtools (v.1.5): Conversion of SAM to BAM.
MaxBin2 (v.2.2.4): Binning of Contigs.
Metabat2 (v.2.12.1): Binning of Contigs.
dRep (v.2.0.5): Evaluation of Binned Contigs.
CheckM (v.1.0.11): Used by dRep.

Many of these tools have additional dependencies that are not listed here. If the tool works properly on its own, these are likely satisfied.

Toe remove human host/lab contamination the database has to be prepared as described here.

dRep and CheckM used different versions of Python2 and Python3. Please follow the instructions provided on the dRep website to solve this issue using pyenv.

The version numbers are the software versions used in development/testing of the pipeline.

Code structure

This pipeline is comprised of the following components:

main.nf (the actual workflow definition)
nextflow.config (the top-level configuration file with generic options)
config/rzcluster.config (the RZcluster specific configuration options)
README.md (this file)

Credit

This pipeline was developed by M. Rühlemann.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
config		config
helpers		helpers
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
main.nf_bu		main.nf_bu
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LaMeta

UNDER CONSTRUCTION

Overview

Executing the pipeline

Optional parameters

Reading the output

Dependencies and versions

Code structure

Credit

About

Releases

Packages

Languages

License

mruehlemann/LaMeta

Folders and files

Latest commit

History

Repository files navigation

LaMeta

UNDER CONSTRUCTION

Overview

Executing the pipeline

Optional parameters

Reading the output

Dependencies and versions

Code structure

Credit

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages