Inherited retinal degenerations (IRD) are heterogeneous disease groups that are frequently caused by a combination of deep-intronic variants (DIVs). In general, such DIVs are tightly associated with splicing disruptions that include cryptic splice sites. In this project, we have benchmarked the existing tools to evaluate the contribution of various DIVs to the development of IRD.
Overall, 206 DIVs involving 103 pathogenic and 103 benign were explored. The minimum distance for pathogenic variants from exon-intron junction is 160 bp, for benign variants - 196 bp. Some of the datasets are intersected therefore the sum of pathogenetic variants does not amount to 103. Analysis was performed using GRCh38 genome assembly.
Pathogenic
- 57: Roosing et al✧
- 4: Lu Tian et al
- 3: Lui et al✦
- 51: experimental data that were provided by Marianna Weener (see Contacts)✦
Benign
- 103: gnomAD
To explore in more detail the datasets were divided into two sets of 57✧ and 54✦ variants.
Preliminarily, the data was annotated with Ensembl Variant Effect Predictor (VEP).
For splicing variant interpretation, the following tools were used:
The obtained scores were processed with VETA
According to the obtained metrics, SpliceAI, Pangolin and SPiP were selected.
103 variants | 54 variants |
---|---|
*PDIVAS is not presented since it predicts only 50% of the data
SQUIRLS | SPiP * |
---|---|
*SpliceAI and Pangolin have the similar distribution.
The thresholds were chosen according to the F1 score. The details can be found in Data_analysis.ipynb
- Pangolin: 0.05
- SpliceAI: 0.05
- SPiP: 0.015
With the obtained thresholds DIVs in USH2A, CRB1 and ABCA4 genes (gnomAD) have been annotated.
The results are presented in table ABCA4_CRB1_USH2A_annot.csv.
The symbols used:
🟢 Benign
🟡 Pathogenic under one assessment
🟠 Pathogenic under two assessments
🔴 Pathogenic under three assessments
- Supervisor: MD, PhD Marianna Weener
- Ekaterina Shitik
- Ustin Zolotikov