The following are softwares and resources that can be dowloaded/prepared based specified instructions.
MosaicHunter: https://github.com/zzhang526/MosaicHunter
MosaicForecast: https://github.com/parklab/MosaicForecast
GATK: https://github.com/broadinstitute/gatk/releases
Resources | Example/Sources/Notes |
---|---|
Reference genome | e.g hs37d5.fa |
Common variant databases | dbSNP (e.g b37_dbsnp_138.b37.vcf) gnomAD (e.g somatic-b37_af-only-gnomad.raw.sites.vcf) |
Repeats | e.g all_repeats.b37.bed; can be found in MosaicHunter installation (i.e $MHDIR/MosaicHunter-master/resources) |
Exome errors databases | e.g WES_Agilent_71M.error_prone.b37.bed; can be found in MosaicHunter installation (i.e $MHDIR/MosaicHunter-master/resources) |
Panel of Normal (PON) | Should be prepared based on samples that are not part of the analysis. As a suggestion for large cohort analysis, samples can be divided into two cohorts to create two Panel of Normals (PON_A and PON_B). This PON can be prepared based on GATK option-CreateSomaticPanelOfNormals (i.e https://gatk.broadinstitute.org/hc/en-us/articles/4405451431963-CreateSomaticPanelOfNormals-BETA) |
There is a config-file, in which directories of softwares/resources (as prepared in Step 1) should be specified.
Variants are either detected in singletons and/or trios using three mosaic variant callers (MosaicHunter, MosaicForecast, Mutect2) and GATK-HC. A wrapper-script (MasterScript_MosaiC-All.sh) was developed for this purpose, and can be executable in University of Adelaide HPC environment using the following command.
MosaiC-All/MasterScript_MosaiC-All.sh -s $SampleID.list -o $Outputs -c MosaiC-All/Mosaic-All.config
- $SampleID.list: A tab-separated-file as following format based on the Bam.files of each sample (e.g 001P.realigned.bam)
Directory of Bam files | ProbandID | Gender | MotherID | FatherID |
---|---|---|---|---|
./path | 001P | F | 001M | 001F |
-
$Outputs: An Output directory to store all final outputs
-
$DIR/MosaiC-All/MosaiC-All.config: A config file prepared as mentioned in Step 1 and 2.
Aims:
- to filter MFcalls manually and
- Merge all variants based on each tool.
sbatch $SCRIPTDIR/CombineCalls.sh -s $sampleID -d $Outputs
Requirements:-
- sampleID (i.e 001P)
- Outputs (Output directory as specified in Step 3)
Aims:
- To flag variants that were found in same sample, using 1-3 mosaic variant calling tools
- Followed by filtering out variants that were found only by one tool
Aims:
- To filter parental mosaic variant calls based on transmission to children
5.1 Prefilter
- Identify inherited variants that expected to be mosaic based on AAF and GT, using GATKHC outputs
Command:
sbatch /MosaiC-ALL/postprocessing/pGoM.sh -v /path/to/directory_of_vcf -s FamilyID.txt -o /path/to/output/directory
Requirements:
-
input directory (where can we find the family.vcf).
-
sampleID list (one header row and then tab-delimited columns $BAMdir,$ProbandID,$Gender,$Mother,$Father).
Directory of Bam files | ProbandID | Gender | MotherID | FatherID | FamilyVCF |
---|---|---|---|---|---|
./path | 001P | F | 001M | 001F | Trio001.vcf |
./path | 004P | F | 004M | 004F | 004.family.vcf |
- /path/to/output/directory (A location for the output files).
5.2 Postfilter
- Mosaic variants are identified among prefiltered pGoM variants using one or more mosaic variant calling tools
- Example script: MosaiC-ALL/postprocessing/pGoMpipeline.R
Requirements:
- Three Output files from MosaiC-ALL/postprocessing/M3_CombineCalls.sh
- pGoM.sh output file
- Amend the working directory and output_file prefix in the R.script