Skip to content

The repository for scientific project "In silico prediction assessment of deep intronic variants in IRD" in Bioinformatics Institute course

Notifications You must be signed in to change notification settings

EkaterinShitik/IRD_prediction_assessment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IRD_prediction_assessment

Inherited retinal degenerations (IRD) are heterogeneous disease groups that are frequently caused by a combination of deep-intronic variants (DIVs). In general, such DIVs are tightly associated with splicing disruptions that include cryptic splice sites. In this project, we have benchmarked the existing tools to evaluate the contribution of various DIVs to the development of IRD.

Data

Overall, 206 DIVs involving 103 pathogenic and 103 benign were explored. The minimum distance for pathogenic variants from exon-intron junction is 160 bp, for benign variants - 196 bp. Some of the datasets are intersected therefore the sum of pathogenetic variants does not amount to 103. Analysis was performed using GRCh38 genome assembly.

Pathogenic

Benign

To explore in more detail the datasets were divided into two sets of 57✧ and 54✦ variants.

Tools

Preliminarily, the data was annotated with Ensembl Variant Effect Predictor (VEP).

For splicing variant interpretation, the following tools were used:

The obtained scores were processed with VETA

Results

According to the obtained metrics, SpliceAI, Pangolin and SPiP were selected.

AUROC for studied tools *

103 variants 54 variants
drawing drawing

*PDIVAS is not presented since it predicts only 50% of the data

Class distribution for SQUIRLS and SPiP

SQUIRLS SPiP *
drawing drawing

*SpliceAI and Pangolin have the similar distribution.

Selected thresholds

The thresholds were chosen according to the F1 score. The details can be found in Data_analysis.ipynb

  • Pangolin: 0.05
  • SpliceAI: 0.05
  • SPiP: 0.015

Annotation

With the obtained thresholds DIVs in USH2A, CRB1 and ABCA4 genes (gnomAD) have been annotated.

The results are presented in table ABCA4_CRB1_USH2A_annot.csv.

The symbols used:

🟢 Benign

🟡 Pathogenic under one assessment

🟠 Pathogenic under two assessments

🔴 Pathogenic under three assessments

Contacts

About

The repository for scientific project "In silico prediction assessment of deep intronic variants in IRD" in Bioinformatics Institute course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published