Skip to content

Commit

Permalink
Merge branch 'development'
Browse files Browse the repository at this point in the history
  • Loading branch information
julesjacobsen committed Feb 29, 2024
2 parents 48875d2 + d2771f7 commit bc26fd3
Show file tree
Hide file tree
Showing 391 changed files with 110,807 additions and 25,677 deletions.
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jobs:
docker:
# was circleci/openjdk:8-jdk but something changed and it tests failed on fork
#- image: circleci/openjdk@sha256:3640c4f42886e796e805c23af48b0d7348dc1d3fa8dae9a365e1f023f913c795
- image: circleci/openjdk:11.0.4-jdk-stretch
- image: cimg/openjdk:17.0.7
steps:
- checkout
- run: chmod +x mvnw
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/maven.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ name: Java CI with Maven

on:
push:
branches: [ "master", "develop" ]
branches: [ "master", "development" ]
pull_request:
branches: [ "master", "develop" ]
branches: [ "master", "development" ]

jobs:
build:
Expand All @@ -21,10 +21,10 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Set up JDK 11
- name: Set up JDK 17
uses: actions/setup-java@v3
with:
java-version: '11'
java-version: '17'
distribution: 'temurin'
cache: maven
- name: Build with Maven
Expand Down
35 changes: 35 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Read the Docs configuration file for Sphinx projects
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.12"
# You can also specify other tool versions:
# nodejs: "20"
# rust: "1.70"
# golang: "1.20"

# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/conf.py
# You can configure Sphinx to use a different builder, for instance use the dirhtml builder for simpler URLs
# builder: "dirhtml"
# Fail on all warnings to avoid broken references
# fail_on_warning: true

# Optionally build your docs in additional formats such as PDF and ePub
formats:
- pdf
- epub

# Optional but recommended, declare the Python requirements required
# to build your documentation
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: docs/requirements.txt
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ For further instructions on installing and running please refer to the [README.m

#### Running it

Please refer to the [manual](http://exomiser.github.io/Exomiser/) for details on how to configure and run the Exomiser.
Please refer to the [manual](https://exomiser.readthedocs.io/en/latest/) for details on how to configure and run the Exomiser.

#### Demo site

Expand Down
29 changes: 25 additions & 4 deletions docs/acmg_assignment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,24 @@ Computational and Predictive Data
PVS1
----
Variants must have a predicted loss of function effect, be in a gene with known disease associations and have a gene
constraint LOF O/E < 0.7635 (gnomAD 2.1.1) to suggest that a gene is LoF intolerant. Variants not predicted to lead to
constraint LOF O/E < 0.7635 (gnomAD 4.0) to suggest that a gene is LoF intolerant. Variants not predicted to lead to
NMD (those located in the last exon) will have the modifier downgraded to Strong.

PS1
---
Variants with the same amino acid change as previously reported P/LP missense or in-frame indel ClinVar variants will be
assigned `PS1` with a strength of `Strong` for variants >= 2 stars, `Moderate` for variants with 1 star or `Supporting`
for those without a ClinVar start rating.

PM4
---
Stop-loss and in-frame insertions or deletions, not previously assigned a `PVS1` criterion are assigned `PM4`.

PM5
---
Variants having a novel missense change to an amino acid where a previously reported ClinVar P/LP variant has been seen
will be assigned `PM5` with a strength of `Moderate` for those with >=2 stars or `Supporting` otherwise.

PP3 / BP4
---------
If REVEL is chosen as a pathogenicity predictor for missense variants, `PP3` and `BP4` are assigned using the modifiers
Expand All @@ -46,6 +57,16 @@ Note that this suggests the use of modifiers up to Strong in the case of pathoge
Otherwise, an ensemble-based approach will be used for other pathogenicity predictors as per the original 215 guidelines.
It should be noted we found better performance using the REVEL-based approach when testing against the 100K genomes data.

Functional Data
===============
PM1
---
Missense and inframe indels are assigned `PM1` if the surrounding region of 25 nucleotides either side of the variant
contain at least 4 reported P/LP variants in ClinVar and no B/LB variants. If the number of P/LP variants is greater
than the number of VUS in the region the strength will be assigned `Moderate` but regions containing P/LP <= VUS
(and no B/BL) will have the strength downgraded to `Supporting`.


Segregation Data
================
BS4
Expand Down Expand Up @@ -158,16 +179,16 @@ conjunction with a disorder, the assigned criteria with any modifiers and the fi
],
"frequencyData": {
"rsId": "rs121918506",
"score": 1
"frequencyScore": 1
},
"pathogenicityData": {
"clinVarData": {
"alleleId": "28333",
"primaryInterpretation": "LIKELY_PATHOGENIC",
"reviewStatus": "criteria provided, single submitter"
},
"score": 0.965,
"predictedPathogenicityScores": [
"pathogenicitycore": 0.965,
"pathogenicityScores": [
{
"source": "REVEL",
"score": 0.965
Expand Down
89 changes: 52 additions & 37 deletions docs/advanced_analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,12 +107,7 @@ requires anything different, it is possible to manually define the data sources
TOPMED,
UK10K,
ESP_AFRICAN_AMERICAN, ESP_EUROPEAN_AMERICAN, ESP_ALL,
EXAC_AFRICAN_INC_AFRICAN_AMERICAN, EXAC_AMERICAN,
EXAC_SOUTH_ASIAN, EXAC_EAST_ASIAN,
EXAC_FINNISH, EXAC_NON_FINNISH_EUROPEAN,
EXAC_OTHER,
ESP_AA, ESP_EA, ESP_ALL,
GNOMAD_E_AFR,
GNOMAD_E_AMR,
Expand Down Expand Up @@ -208,25 +203,18 @@ Here you can specify which variant frequency databases you want to use. You can
array format as the HPO IDs.

The data sources used are from `1000 genomes <http://www.1000genomes.org>`_ (via DBSNP), `DBSNP <https://www.ncbi.nlm.nih.gov/projects/SNP/>`_,
`ESP <https://evs.gs.washington.edu/EVS/>`_, `ExAC, gnomAD exomes and gnomAD genomes <https://gnomad.broadinstitute.org/about>`_,
`UK10K <https://www.uk10k.org/>`_ (via DBSNP), `TOPMed <https://topmed.nhlbi.nih.gov/>`_ (via DBSNP).
`ESP <https://evs.gs.washington.edu/EVS/>`_, `UK10K <https://www.uk10k.org/>`_ (via DBSNP), `TOPMed <https://topmed.nhlbi.nih.gov/>`_ (via DBSNP).

As of the 2402 data release `ExAC, gnomAD exomes and gnomAD genomes <https://gnomad.broadinstitute.org/about>`_ source
has been removed as this is part of the gnomAD 2.1+ data.

DBSNP:
``THOUSAND_GENOMES``,
``UK10K``,
``TOPMED``

ESP:
``ESP_AFRICAN_AMERICAN``, ``ESP_EUROPEAN_AMERICAN``, ``ESP_ALL``

ExAC:
``EXAC_AFRICAN_INC_AFRICAN_AMERICAN``,
``EXAC_AMERICAN``,
``EXAC_SOUTH_ASIAN``,
``EXAC_EAST_ASIAN``,
``EXAC_FINNISH``,
``EXAC_NON_FINNISH_EUROPEAN``,
``EXAC_OTHER``
``ESP_AA``, ``ESP_EA``, ``ESP_ALL``

gnomAD exomes:
``GNOMAD_E_AFR``,
Expand All @@ -235,21 +223,26 @@ gnomAD exomes:
``GNOMAD_E_EAS``,
``GNOMAD_E_FIN``,
``GNOMAD_E_NFE``,
``GNOMAD_E_MID``,
``GNOMAD_E_OTH``,
``GNOMAD_E_SAS``,

gnomAD genomes:
``GNOMAD_G_AFR``,
``GNOMAD_G_AMR``,
``GNOMAD_G_AMI``,
``GNOMAD_G_ASJ``,
``GNOMAD_G_EAS``,
``GNOMAD_G_FIN``,
``GNOMAD_G_NFE``,
``GNOMAD_G_MID``,
``GNOMAD_G_OTH``,
``GNOMAD_G_SAS``

We recommend using all databases if the proband population background is unknown, although removing the ``GNOMAD_E_ASJ``
and ``GNOMAD_G_ASJ``, unless your proband is known to come from an Ashkenazi population e.g.
We recommend using all databases if the proband population background is unknown, although removing the ``ASJ``, ``AMI``,
``FIN``, ``MID`` and ``OTH`` populations is recommended as these are small/founder populations which are likely to have
artificially high allele frequencies for some relevant variants. These populations will not be included when assessing
the population frequency for the ACMG assignments, even if used in the filtering.

.. code-block:: yaml
Expand All @@ -258,29 +251,24 @@ and ``GNOMAD_G_ASJ``, unless your proband is known to come from an Ashkenazi pop
TOPMED,
UK10K,
ESP_AFRICAN_AMERICAN, ESP_EUROPEAN_AMERICAN, ESP_ALL,
EXAC_AFRICAN_INC_AFRICAN_AMERICAN, EXAC_AMERICAN,
EXAC_SOUTH_ASIAN, EXAC_EAST_ASIAN,
EXAC_FINNISH, EXAC_NON_FINNISH_EUROPEAN,
EXAC_OTHER,
ESP_AA, ESP_EA, ESP_ALL,
GNOMAD_E_AFR,
GNOMAD_E_AMR,
# GNOMAD_E_ASJ,
# GNOMAD_E_ASJ,
GNOMAD_E_EAS,
GNOMAD_E_FIN,
# GNOMAD_E_FIN,
GNOMAD_E_NFE,
GNOMAD_E_OTH,
# GNOMAD_E_OTH,
GNOMAD_E_SAS,
GNOMAD_G_AFR,
GNOMAD_G_AMR,
# GNOMAD_G_ASJ,
# GNOMAD_G_ASJ,
GNOMAD_G_EAS,
GNOMAD_G_FIN,
# GNOMAD_G_FIN,
GNOMAD_G_NFE,
GNOMAD_G_OTH,
# GNOMAD_G_OTH,
GNOMAD_G_SAS
]
Expand All @@ -289,14 +277,27 @@ and ``GNOMAD_G_ASJ``, unless your proband is known to come from an Ashkenazi pop

pathogenicitySources:
---------------------
Possible pathogenicitySources: ``POLYPHEN``, ``MUTATION_TASTER``, ``SIFT``, ``REVEL``, ``MVP``, ``CADD``, ``REMM``. ``REMM`` is trained on
Possible pathogenicitySources: ``POLYPHEN``, ``MUTATION_TASTER``, ``SIFT``, ``REVEL``, ``MVP``, ``ALPHA_MISSENSE``,
``SPLICE_AI`` (derived from gnomAD 4.0, so only available for hg38), ``CADD``, ``REMM``. ``REMM`` is trained on
non-coding regulatory regions. **WARNING** if you enable ``CADD``, ensure that you have downloaded and installed the CADD
tabix files and updated their location in the ``application.properties`` (see :ref:`cadd-install`). Exomiser will not run
without this.

We recommend using either ``[REVEL, MVP]`` **OR** ``[POLYPHEN, MUTATION_TASTER, SIFT]`` as REVEL and MVP are newer
predictors which have been shown to have better performance and are more nuanced. Mixing them with the Polyphen2,
MutationTaster or SIFT will give worse performance.
MutationTaster or SIFT will give worse performance. Testing on GEL solved cases with AlphaMissense slightly increased
performance when combined with MVP. We advise testing on local cohorts for assessing local performance.

`REVEL scores are freely available for non-commercial use. For other uses, please contact Weiva Sieh.`

`AlphaMissense Database Copyright (2023) DeepMind Technologies Limited. All predictions are provided for non-commercial
research use only under CC BY-NC-SA license. Researchers interested in predictions not yet provided, and for
non-commercial use, can send an expression of interest to alphamissense@google.com.`

`SpliceAI source code is provided under the GPLv3 license. SpliceAI includes several third party packages provided under
other open source licenses, please see NOTICE for additional details. The trained models used by SpliceAI (located in
this package at spliceai/models) are provided under the CC BY NC 4.0 license for academic and non-commercial use; other
use requires a commercial license from Illumina, Inc.`

.. code-block:: yaml
Expand All @@ -319,7 +320,7 @@ Analysis steps are defined in terms of :ref:`variant filters<variantfilters>`, :
operate on genes but also require the variants to have already been filtered. The optimiser will ensure that these are
run at the correct time if they have been incorrectly placed.

Using these it is possible to create artificial exomes, define gene panels or only examine specific regions, for example.
Using these it is possible, for example to create artificial exomes, define gene panels or only examine specific regions.

.. _variantfilters:

Expand Down Expand Up @@ -413,14 +414,20 @@ frequencyFilter:
Frequency cutoff of a variant **in percent**. Frequencies are derived from the databases defined in the :ref:`frequencySources<frequencysources>`
section. We recommend a value below 5.0 % depending on the disease. Variants will be removed/failed if they have a
frequency higher than the stated percentage in any database defined in the :ref:`frequencySources<frequencysources>` section.
_n.b_ Not defining this filter will result in all variants having no frequency data, even if the :ref:`frequencySources<frequencysources>`
contains values.

.. code-block:: yaml
frequencyFilter: {maxFrequency: 1.0}
.. important::

Not defining this filter will result in all variants having no frequency data, even if the :ref:`frequencySources<frequencysources>`
are defined. Failing to include this will result in Exomiser assuming variants are all very rare and subsequently
assigning an artificially inflated score, especially for very common variants. If you want to score all variants and
write failed ones to the output, it is recommended to use `analysisMode: FULL`.


.. _pathogenicityfilter:

pathogenicityFilter:
Expand All @@ -435,6 +442,14 @@ This filter is meant to be quite permissive and we recommend it be set to true.
pathogenicityFilter: {keepNonPathogenic: true}
.. important::

Not defining this filter will result in all variants having no pathogenicity data or ClinVar annotations, even if the
:ref:`pathogenicitySources<pathogenicitysources>` are defined. Failing to include this will result in Exomiser
using default scores based on the assigned variant effect. If you want to score all variants and write failed ones
to the output, it is recommended to use `analysisMode: FULL`.


.. _genefilters:

Gene filters
Expand Down
6 changes: 3 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,11 @@
# -- Project information -----------------------------------------------------

project = u'exomiser'
copyright = u'2021, Jules Jacobsen, Damian Smedley, Peter Robinson'
copyright = u'2024, Jules Jacobsen, Damian Smedley, Peter Robinson'
author = u'Jules Jacobsen, Damian Smedley, Peter Robinson'

# The short X.Y version
version = u'13.1.0'
version = u'14.0.0'
# The full version, including alpha/beta/rc tags
release = version

Expand Down Expand Up @@ -94,7 +94,7 @@
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
language = 'en'

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
Expand Down
9 changes: 9 additions & 0 deletions docs/input_files_and_options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,15 @@ genome assembly and is enabled in the ``application.properties`` see the :ref:`r
Analysis
========

.. important::

The exome and genome analyses found in the `test-analysis-exome.yml` and `test-analysis-genome.yml` files are
recommended for use in most situations, and removing steps from the analysis is likely to negatively impact
performance. It is *strongly* recommended to test any changes against the standard setup on the example samples and
your own solved cases to check the impact of any changes you might want to make. If you want to score all variants
and write failed ones to the output, it is recommended to use `analysisMode: FULL`.


Analysis files contain all possible options for running an analysis including the ability to specify variant frequency
and pathogenicity data sources and the ability to tweak the order that analysis steps are performed.

Expand Down
Loading

0 comments on commit bc26fd3

Please sign in to comment.