ahcg_pipeline

Variant calling pipeline for genomic data analysis

Requirements

Reference genome

Reference genomes can be downloaded from Illumina iGenomes

Test data

Use the following protocol to download and prepare test dataset from NIST sample NA12878

wget ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/NIST7035_TAAGGCGA_L001_R1_001.fastq.gz
wget ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/NIST7035_TAAGGCGA_L001_R2_001.fastq.gz
gunzip NIST7035_TAAGGCGA_L001_R1_001.fastq.gz
gunzip NIST7035_TAAGGCGA_L001_R2_001.fastq.gz
head -100000 NIST7035_TAAGGCGA_L001_R1_001.fastq > test_r1.fastq
head -100000 NIST7035_TAAGGCGA_L001_R2_001.fastq > test_r2.fastq

Help

To access help use the following command:

python3 ahcg_pipeline.py -h

Workflow

Download Reference Files

wget www.prism.gatech.edu/~sravishankar9/resources.tar.gz This contains hg19 and dbsnp vcf for hg19

Bowtie indexes

bowtie2-build -f hg19.fa Assigns reference as hg19.fa

Install samtools

sudo apt-get install samtools

Install Java

sudo apt-get install openjdk-8-jre

Create index file using samtools

samtools faidx resources/genome/hg19.fa

Create dict file using Picard

jre1.8.0_101/bin/java -jar lib/picard.jar CreateSequenceDictionary R=resources/genome/hg19.fa O=hg19.dict

Test Files

wget ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/NIST7035_TAAGGCGA_L001_R1_001.fastq.gz
wget ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/NIST7035_TAAGGCGA_L001_R2_001.fastq.gz

gunzip NIST7035_TAAGGCGA_L001_R1_001.fastq.gz
gunzip NIST7035_TAAGGCGA_L001_R2_001.fastq.gz
head -100000 NIST7035_TAAGGCGA_L001_R1_001.fastq > test_r1.fastq
head -100000 NIST7035_TAAGGCGA_L001_R2_001.fastq > test_r2.fastq

Run the script

python ahcg_pipeline.py -t lib/Trimmomatic-0.36/trimmomatic-0.36.jar -b lib/bowtie2-2.2.9/bowtie2 -p lib/picard.jar -g lib/GenomeAnalysisTK.jar -i Test/test_r1.fastq Test/test_r2.fastq -w Bowtie_index/hg19 -d resources/dbsnp/dbsnp_138.hg19.vcf -r resources/genome/hg19.fa -a lib/Trimmomatic-0.36/adapters/TruSeq3-SE.fa -o ./

Mapping regions of interest for BRCA1

Steps for extracting reads mapping to BRCA1 from NA12878 HiSeq Exome dataset:

Downloading the NA12878 HiSeq Exome dataset: The bam files for the sample can be downloaded from the ftp link mentioned on GIAB GitHub page. There are four runs for this sample, we can start by downloading one of them and extracting the reads.
Using samtools to subset the bam file to regions corresponding to BRCA1: Using the bed file containing the BRCA1 exonic coordinates we can subset the NA12878 sample using samtools

samtools view -L -b -o < outout bam file > < input bam file >

Note: -b just specifies that the output needs to be a bam file.
Using bedtools to convert the bam file to a fastq file: From the brca1 bam file we now extract the reads aligning to the region using bedtools

bedtools bamtofastq -i -fq < fastq r1> -fq2 < fastq r2>

Variant Quality Score Recalibration (VQSR)

jre1.8.0_101/bin/java -Xmx4g -jar lib/GenomeAnalysisTK.jar 
-T VariantRecalibrator 
-R resources/genome/hg19.fa 
-input NA12878_variants.vcf 
-resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.sites.vcf 
-resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.b37.sites.vcf 
-resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.vcf 
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 resources/dbsnp/dbsnp_138.hg19.vcf 
-an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP 
-recalFile output.recal -tranchesFile output.tranches -rscriptFile output.plots.R

Update Github

git add .
git commit -m "Message"
git push origin master

DCM

Clinical Report Generation

#shrink clinvar to just DCM genes
bedtools intersect -a clinvar.vcf.gz -b dcm_gene_list.bed -header > clinvar_allfrombed.vcf

#shrink variants to just DCM genes
bedtools intersect -a patient1_variants_recal.vcf -b dcm_gene_list.bed -header > patient2_dcm_final.vcf

#match variants to clinvar
bedtools intersect -b patient1_dcm_final.vcf -a clinvar_allfrombed.vcf -header > patient2_intersect_clinvar.vcf

#generate simple report on findings
python3 parse_clnsig.py -i patient1_intersect_clinvar.vcf.gz 2>&1 | tee patient2_simple_report.txt
cut -c 24- patient2_simple_report.txt

Coverage Calculator

samtools view -L $GENE_LIST $BAM_PATH -b > new.bam
bedtools genomecov -ibam new.bam -bga > coverage_output.bed
bedtools intersect -loj -split -a $GENE_LIST -b coverage_output.bed > cov.bed
awk '{printf("%s\t%s\t%s\t%s\t%s\t%s\n", $1,$2,$3,$8,$4,$6)}' cov.bed > new_cov.bed
python cov.py new_cov.bed new_cov_depth.txt
awk '{print >> $4}' new_cov_depth.txt
genes=( "LMNA" "MYBPC3" "MYH6" "MYH7" "SCNSA" "TNNT2" )
for i in "${genes[@]}"
do
	Rscript draw_depth.R "$i" "$i".png
done

Create Report as PDF

convert patient1_simple_report.txt LMNA.png MYBPC3.png MYH6.png MYH7.png SCNSA.png TNNT2.png report.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
DCM		DCM
brca1		brca1
lib		lib
.gitignore		.gitignore
README.md		README.md
README.txt		README.txt
ahcg_pipeline.py		ahcg_pipeline.py
brca1.vcf		brca1.vcf
listOfGenes.txt		listOfGenes.txt
masterscript.sh		masterscript.sh
new_cov.bed		new_cov.bed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ahcg_pipeline

Requirements

Reference genome

Test data

Help

Workflow

Download Reference Files

Bowtie indexes

Install samtools

Install Java

Create index file using samtools

Create dict file using Picard

Test Files

Run the script

Mapping regions of interest for BRCA1

Variant Quality Score Recalibration (VQSR)

Update Github

DCM

Clinical Report Generation

Coverage Calculator

Create Report as PDF

About

Releases

Packages

Languages

Niveda-S/ahcg_pipeline

Folders and files

Latest commit

History

Repository files navigation

ahcg_pipeline

Requirements

Reference genome

Test data

Help

Workflow

Download Reference Files

Bowtie indexes

Install samtools

Install Java

Create index file using samtools

Create dict file using Picard

Test Files

Run the script

Mapping regions of interest for BRCA1

Variant Quality Score Recalibration (VQSR)

Update Github

DCM

Clinical Report Generation

Coverage Calculator

Create Report as PDF

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages