smMIPS-Detection-Pipeline

Introduction

smMIPS-Detection-Pipeline is a bioinformatics analysis pipeline to identify somatic mutations from Illumina Next Generation Sequencing data for libraries prepared using smMIPS. The pipeline is built using Nextflow (version 20.04.1). The pipeline trims adapters, creates paired end assembly, maps the reads to the human genome, uses GATK Best Practices to create BAM files. CoverView is used to report read depth coverage and quality metrics of the data. A variety of variant callers are used to call mutations followed by a machine learning based algorithm in SomaticSeq to create a consensus based VCF. Platypus and Freebayes are used to call INDELs. FLT3-ITD variants are called using Get-ITD. The variants are then annotated with ANNOVAR and CAVA and formatted using custom scripts.

Pipeline Summary

Adaptor Trimming (fastq-mcf)
Merge paired-end reads (PEAR)
Alignment (bwa mem)
SAMtools conversion
Generating Final BAM files based on GATK Best Practices
i. RealignerTargetCreator
ii. BaseRecalibrator
iii. PrintReads
iv. SAMtools sort and index on aligned recaliberated BAM files
Variant Calling using-
a. Mutect2
b. Freebayes
c. Platypus
d. VarDict
e. VarScan
f. Strelka
g. LoFreq
h. SomaticSeq with inputs from-
i. Mutect2
ii. VarDict
iii. VarScan
iv. Lofreq
v. Strelka
Freebayes and Platypus VCF files are combined (GATK CombineVariants)
Annotation of SomaticSeq and Combined(Freebayes+Platypus) VCF files. (Using ANNOVAR)
getITD
Coverview
CAVA
ANNOVAR Annotated files are formatted using custom python scripts
Excel Sheet generated for each sample in Final_Output Directory
Temporary files for each sample are deleted

Software Dependencies

fastq-mcf = 1.05
PEAR = v0.9.10
Picard = 2.17.1
BWA = 0.7.12
Samtools = 1.7
GATK = 3.8
Picard = 2.17.1
Freebayes = v1.3.2
Lofreq = 2.1.4
VarDictJava = 1.8
Varscan = v2.3.9
Platypus
Annovar

Usage

Prerequisites-

Install dependencies listed above.
Create a sample sheet in .CSV format, with a list of sample IDs. (For example: "20NGS1234" for sample 20NGS1234_S4_L001_R1_001.fastq.gz 20NGS1234_S4_L001_R2_001.fastq.gz)
Create BEDfiles as follows- a. Create a BED4 coordinate file (chr start-coordinate stop-coordinate gene/exon-name Example: chr9 133738125 133738246 ABL1Exon4_1)
```
 Update path to this bedfile in nextflow.config for the "bedfile" parameter.
```
b. Generate platypus regions file as follows-
```
 awk 'BEGIN{OFS=""}{print $1,":",$2,"-",$3}' original_file.bed > original_file_regions.txt
```
Add path to this bedfile in nextflow.config file for the "regions" parameter.

c. Sort, compress and index bed file
```
 sort -k 1,1 -k 2,2n -k 3,3n original_bedfile.bed | bgzip -c > original_bedfile.bed.gz
 tabix -p bed original_bedfile.bed.gz
```
(Note: Make sure the 4 files below have the same name: example bedfile.bed , bedfile_regions.txt, bedfile.bed.gz , bedfile.bed.gz.tbi)
Upload all fastq.gz files to a directory. Add the path to the directory in the sequences parameter in nextflow.config or provide the path on command line with the --sequences option.
Update config file (nextflow.config) to add the paths to the input sample sheet, directory containing sequences, BEDfile and to the tools required by the pipeline.

To run-

cd /path/to/directory/containing/main.nf/
nextflow run main.nf [ options ] -entry [workflow]

Optional Arguments (not required if updated in nextflow.config file)

--sequences             path to directory containing fastq files.
--input                 path to csv file containing list of sample IDs.
--genome                path to reference genome
--adaptors              path to FASTA file of adaptors for trimming
--bedfile               path to BED4 coordinate BEDfile (without the extension)

Workflow

--entry MIPS            to run the MIPS pipeline
--entry ALL             to run ALL pipeline
--entry CLL             to run CLL pipeline
--entry AMPLICON        for amplicon based pipelines

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
adaptors		adaptors
bedfile		bedfile
scripts		scripts
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
readme.md		readme.md
sample_list.csv		sample_list.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

smMIPS-Detection-Pipeline

Introduction

Pipeline Summary

Software Dependencies

Usage

Prerequisites-

To run-

Optional Arguments (not required if updated in nextflow.config file)

Workflow

About

Releases

Packages

Contributors 2

Languages

patkarlab/smMIPS-Detection-Pipeline

Folders and files

Latest commit

History

Repository files navigation

smMIPS-Detection-Pipeline

Introduction

Pipeline Summary

Software Dependencies

Usage

Prerequisites-

To run-

Optional Arguments (not required if updated in nextflow.config file)

Workflow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages