Skip to content

Commit

Permalink
import methods to load ontologies via pronto (#7)
Browse files Browse the repository at this point in the history
merges #7
  • Loading branch information
dhimmel committed Mar 25, 2021
1 parent 9ee8833 commit d9125a0
Show file tree
Hide file tree
Showing 5 changed files with 140 additions and 2 deletions.
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,5 @@ repos:
hooks:
- id: mypy
args: ["--strict", "--show-error-codes"]
additional_dependencies:
- pronto
36 changes: 34 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ Here, we'll use the example [metals ontology](https://jbiomedsem.biomedcentral.c
<!-- use absolute URL instead of media/metals.svg for PyPI long_description -->

Note that `NXOntology` represents the ontology as a [`networkx.DiGraph`](https://networkx.org/documentation/stable/reference/classes/digraph.html), where edge direction goes from superterm to subterm.
Currently, users must create their own `networkx.DiGraph` to use this package.

Given an `NXOntology` instance, here how to compute intrinsic similarity metrics.

Expand Down Expand Up @@ -82,9 +81,42 @@ Node fill color corresponds to the Sánchez information content, such that darke
The most informative common ancestor (coinage) is outlined with a bold solid line.
Nodes that are not an ancestor of gold or silver have an invisible outline.

### Loading ontologies

Pronto supports reading ontologies from the following file formats:

1. [Open Biomedical Ontologies 1.4](http://owlcollab.github.io/oboformat/doc/GO.format.obo-1_4.html): `.obo` extension, uses the [fastobo](https://github.com/fastobo/fastobo-py) parser.
2. [OBO Graphs JSON](https://github.com/geneontology/obographs): `.json` extension, uses the fastobo parser.
3. [Ontology Web Language 2 RDF/XML](https://www.w3.org/TR/owl2-overview/%3E): `.owl` extension, uses the pronto `RdfXMLParser`.

The files can be local or at a network location (URL starting with https, http, or ftp).
Pronto detects and handles gzip, bzip2, and xz compression.

Here are examples operations on the Gene Ontology,
using pronto to load the ontology:

```python
>>> from nxontology.imports import from_file
>>> # versioned URL for the Gene Ontology
>>> url = "http://release.geneontology.org/2021-02-01/ontology/go-basic.json.gz"
>>> nxo = from_file(url)
>>> nxo.n_nodes
44085
>>> # similarity between "myelination" and "neurogenesis"
>>> sim = nxo.similarity("GO:0042552", "GO:0022008")
>>> round(sim.lin, 2)
0.21
>>> import networkx as nx
>>> # Gene Ontology domains are disconnected, expect 3 components
>>> nx.number_weakly_connected_components(nxo.graph)
3
```

Users can also create their own `networkx.DiGraph` to use this package.

## Installation

nxontology can be installed with `pip` from [[PyPI](https://pypi.org/project/nxontology/) like:
nxontology can be installed with `pip` from [PyPI](https://pypi.org/project/nxontology/) like:

```shell
# standard installation
Expand Down
64 changes: 64 additions & 0 deletions nxontology/imports.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import logging
from os import PathLike
from typing import AnyStr, BinaryIO, Union

from pronto import Ontology as Prontology # type: ignore [attr-defined]

from nxontology import NXOntology
from nxontology.exceptions import NodeNotFound


def pronto_to_nxontology(onto: Prontology) -> NXOntology:
"""
Create an `NXOntology` from an input `pronto.Ontology`.
Obsolete terms are omitted as nodes.
Only is_a / subClassOf relationships are used for edges.
"""
nxo = NXOntology()
nxo.pronto = onto # type: ignore [attr-defined]
for term in onto.terms():
if term.obsolete:
# obsolete was unreliable in pronto < v2.4.0
# https://github.com/althonos/pronto/issues/122
continue
nxo.add_node(
term.id,
identifier=term.id,
label=term.name,
namespace=term.namespace,
)
for term in onto.terms():
# add subClassOf / is_a relations
# https://github.com/althonos/pronto/issues/119
for child in term.subclasses(distance=1, with_self=False):
try:
nxo.add_edge(term.id, child.id)
except NodeNotFound as e:
logging.warning(
f"Cannot add edge: {term.id} --> {child.id} "
f"({term.name} --> {child.name}): {e}"
)
return nxo


def from_obo_library(slug: str) -> NXOntology:
"""
Read ontology from <http://www.obofoundry.org/>.
Delegates to [`pronto.Ontology.from_obo_library`](https://pronto.readthedocs.io/en/stable/api/pronto.Ontology.html#pronto.Ontology.from_obo_library).
"""
onto = Prontology.from_obo_library(slug=slug)
nxo = pronto_to_nxontology(onto)
nxo.graph.graph["from_obo_library"] = slug
return nxo


def from_file(handle: Union[BinaryIO, str, "PathLike[AnyStr]"]) -> NXOntology:
"""
Read ontology in OBO, OWL, or JSON (OBO Graphs) format via pronto.
Arguments:
handle: Either the path to a file or a binary file handle
that contains a serialized version of the ontology.
"""
onto = Prontology(handle=handle)
return pronto_to_nxontology(onto)
39 changes: 39 additions & 0 deletions nxontology/tests/imports_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
import pytest

from nxontology.imports import from_file, from_obo_library


@pytest.mark.parametrize(
"format",
[
"owl",
"obo",
],
)
def test_from_obo_library_taxrank(format: str) -> None:
"""
http://www.obofoundry.org/ontology/taxrank.html
"""
slug = f"taxrank.{format}"
nxo = from_obo_library(slug)
(root,) = nxo.roots
assert root == "TAXRANK:0000000"
cultivar = nxo.node_info("TAXRANK:0000034")
assert cultivar.identifier == "TAXRANK:0000034"
assert cultivar.label == "cultivar"
assert "TAXRANK:0000000" in cultivar.ancestors


def test_from_file_go() -> None:
url = "http://release.geneontology.org/2021-02-01/ontology/go-basic.json.gz"
nxo = from_file(url)
assert nxo.n_nodes > 20_000
# pronto < 2.4.0 marked GO:0000003 as obsolete
# https://github.com/althonos/pronto/issues/122
assert "GO:0000003" in nxo.graph
info = nxo.node_info("GO:0042552")
assert info.identifier == "GO:0042552"
assert info.label == "myelination"
assert info.data["namespace"] == "biological_process"
# has edge from "axon ensheathment" to "myelination"
assert nxo.graph.has_edge("GO:0008366", "GO:0042552")
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ include_package_data = True
python_requires = >=3.7
install_requires =
networkx>=2
pronto>=v2.4.0
fsspec

[options.extras_require]
Expand Down

0 comments on commit d9125a0

Please sign in to comment.