Cancer Variant Calling Pipeline
This pipeline is designed for the analysis of tumor samples to identify somatic and germline mutations using next-generation sequencing data. It includes the following steps:
- Quality Control: Perform quality control on raw sequencing data using FastQC.
- Trimming: Trim adapter sequences and low-quality reads using Trimmomatic.
- Alignment: Map trimmed reads to a reference genome using BWA.
- Sorting and Indexing: Convert aligned reads to BAM format, sort, and index using Samtools.
- Mark Duplicates: Identify and mark duplicate reads using Picard Tools.
- Base Recalibration: Correct systematic errors in base quality scores using GATK.
- Variant Calling: Call variants using HaplotypeCaller from GATK.
- Variant Filtering: Filter variants for somatic and germline mutations using Mutect2 from GATK.
- Annotation: Annotate variants with functional and clinical information using SnpEff and ANNOVAR.
- Prediction of Variant Effects: Predict the functional impact of variants using REVEL.
This pipeline integrates various bioinformatics tools and databases to comprehensively analyze tumor samples and identify potentially pathogenic mutations associated with cancer.
- I have added a SnakeMake file draft to ensure better operationalization of the pipeline