Skip to content

A snakemake pipeline to process methylation data obtain through bisulfite and enzymatic methyl sequencing.

License

Notifications You must be signed in to change notification settings

poloarol/snakemake-methylseq

Repository files navigation

Snakemake workflow: snakemake-methylseq

A Snakemake workflow that enables data processing of methylation sequencing platforms e.g bisulfite sequencing and enzymatic-methyl sequencing.

Workflow

Pipeline-Workflow

Quality Control

  1. FastQC
  2. Fastp
  3. MultiQC

Alignment

  1. Two aligners are used here;
  2. VerifyBamID2

Methylation Analysis

Two methods are employed;

  1. MethylDackel
    • Integrates well with bwa-meth
  2. DMNTools
    • Integrates well with abismal

Expected Output

Quality Control

<path-to-output-folder>\
|   <project-name>\qc\
|       fastqc\
|           <sample>_<read>_fastqc.html
|           <sample>_<read>_fastqc.zip
|       fastp\
|           <sample>_<read>.trimmed_fastqc.html
|           <sample>_<read>.trimmed_fastqc.html
|           <sample>_1.trimmed.fastq
|           <sample>_2.trimmed.fastq
|           <sample>_u1.fastq
|           <sample>_u2.fastq
|           <sample>.merged.fastq
|           <sample>.failed.fastq
|           <sample>.html
|           <sample>.json
|       multiqc\
|           trimmed\
|                multiqc_data\
|                multiqc_report.html
|           untrimmed\
|                multiqc_data\
|                multiqc_report.html

Alignment

1. bwa-meth + samtools + VerifyBamID2
<path-to-output-folder>\
|   <project-name>\alignment\bwa\
|       unsorted/<sample>.sam
|       sorted/
|            <sample>.bam
|            <sample>.bai
|            <sample>.ancestry
|            <sample>.selfsm
|           /picard
|                <sample>.bam
|                <sample>.bam.bai
|                <sample>.metrics.txt

2. abismal + samtools + VerifyBamID2
<path-to-output-folder>\
|   <project-name>\alignment\abismal
|       <sample>.bam
|       <sample>.sorted.bam
|       <sample>.sorted.bai
|       <sample>.filtered.bam
|       stats\
|           <sample>.metrics.yaml
|           <sample>.filtered.metrics.yaml
|       verify_bam_id\
|           <sample>.ancestry
|           <sample>.selfsm

Methylation Analysis

1. DNMTools
<path-to-output-folder>\
|   <project-name>\methyl\dnmtools\
|       <sample>.bsrate
|       <sample>_single_base.meth
|       <sample>_symmetric.meth
|       <sample>_global.meth
|       <sample>.hmr
|       <sample>.hypermr
|       <sample>.epiread
|       <sample>.entropy.meth
|       <sample>.avg.meth

2. MethylDackel - To be completed

Reference files

  • Genome (hg38)
  • Canonical cis-Regulatory elements

Usage

Set-up working conda environment conda create --name <envname> --file requirements.txt NB: - The above step asumes you have conda already installed. - This workflow was built on Ubuntu20.04. Other platforms were not tested.

launch the pipeline form within the <snakemake-methylseq>\<workflow> directory

Test

All Rules: bash run_test.sh all

  1. Quality Control bash run.sh qc

  2. Alignment

    • bwa-meth: bash run.sh alignment bwa
    • abismal: bash run.sh alignment abismal
  3. Methylation Analysis

    • DNMTools: bash run.sh methyl dnmtools
    • MethylDackel: bash run.sh methyl methyldackel

Run workflows

  1. Quality Control bash run.sh qc - <num_cores>

  2. Alignment

    • bwa-meth: bash run.sh alignment bwa
    • abismal: bash run.sh alignment abismal
  3. Methylation Analysis

    • DNMTools: bash run.sh methyl dnmtools
    • MethylDackel: bash run.sh methyl methyldackel

TODO

  • Add script to download and index reference files
  • Add script to download svd_mu for VerifyBamID2
  • Fix MethylDackel environment for to complete analysis
  • Work on the format conversion between abismal and bwa-meth
  • Fix errors from rules entropy and avg_meth_level_region
  • Provide PC plots for contamination of samples
  • Add notes on updating the config file
  • Add documentation on rules and their purpose

About

A snakemake pipeline to process methylation data obtain through bisulfite and enzymatic methyl sequencing.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published