Skip to content

Commit

Permalink
new features & fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
klausweinbauer committed Sep 5, 2024
1 parent c8396f3 commit e44c3c6
Show file tree
Hide file tree
Showing 22 changed files with 591 additions and 268 deletions.
13 changes: 9 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FGUtils is a collection of utility functions for querying functional groups in molecules from their molecular graph representation.
FGUtils is a collection of utility functions for querying functional groups in
molecules from their molecular graph representation.

## Dependencies
- Python (>= 3.11)
Expand All @@ -13,14 +14,18 @@ pip install fgutils
```

## Getting Started
A simple example querying the functional groups for acetylsalicylic acid.
For a comprehensive description of FGUtils features read through the
[documentation](https://klausweinbauer.github.io/FGUtils/). However, querying
the functional groups for a molecule like acetylsalicylic acid is as simple as
running the following:
```
>>> from fgutils import FGQuery
>>>
>>> smiles = "O=C(C)Oc1ccccc1C(=O)O" # acetylsalicylic acid
>>> query = FGQuery(use_smiles=True) # use_smiles requires rdkit to be installed
>>> query = FGQuery()
>>> query.get(smiles)
[('ester', [0, 1, 3]), ('carboxylic_acid', [10, 11, 12])]
```

The output is a list of tuples containing the functional group name and the corresponding atom indices.
The output is a list of tuples containing the functional group name and the
corresponding atom indices.
45 changes: 42 additions & 3 deletions doc/functional_groups.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,49 @@
Functional Groups
=================

FGUtils provides a class :py:class:`~fgutils.query.FGQuery` to query a
molecules functional groups.
FGUtils provides a class :py:class:`~fgutils.query.FGQuery` to retrieve a
molecules functional groups. It can be used directly with the preconfigured
list of functional groups as listed in `Functional Group Tree`_ or by
specifying your own patterns using :py:class:`~fgutils.fgconfig.FGConfig`
and :py:class:`~fgutils.fgconfig.FGConfigProvider`.

Functional group tree
Get functional groups in molecule
=================================

Common functional groups of a molecule can be retrieved by the
:py:class:`~fgutils.query.FGQuery` class. The query can directly use the
molecules SMILES or the molecular graph representation of the compound as
networkx graph. The following example demonstrates how to get the functional
groups from *acetylsalicylic acid*::

>>> from fgutils import FGQuery

>>> smiles = "O=C(C)Oc1ccccc1C(=O)O" # acetylsalicylic acid
>>> query = FGQuery()
>>> query.get(smiles)
[("ester", [0, 1, 3]), ("carboxylic_acid", [10, 11, 12])]


Get changing groups in reaction
===============================

The extended :ref:`graph-syntax` enables the description of reaction
mechanisms by specifying bond changes in ``<>`` brackets. Functional group
patterns can therefor also specify bond changes. Querying bond changes can be
used to look for a changing functional groups in a reaction. The following
example demonstrates how to check for a nucleophilic addition-elimination
reaction on a carbonyl group::

>>> from fgutils.query import FGQuery, FGConfig

>>> smiles = "[C:1][C:2](=[O:3])[O:4][C:5].[O:6]>>[C:1][C:2](=[O:3])[O:6].[O:4][C:5]"
>>> fgconfig = FGConfig(name="carbonyl-AE", pattern="C(=O)(<0,1>R)<1,0>R")
>>> query = FGQuery(config=fgconfig, require_implicit_hydrogen=False)
>>> query.get(smiles)
[("carbonyl-AE", [2, 3, 4, 6])]


Functional Group Tree
=====================

.. code-block::
Expand Down
2 changes: 2 additions & 0 deletions doc/graph_syntax.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _graph-syntax:

============
Graph Syntax
============
Expand Down
12 changes: 12 additions & 0 deletions doc/references.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@
References
==========

fgconfig
========

.. automodule:: fgutils.fgconfig
:members:

utils
=====

.. automodule:: fgutils.utils
:members:

query
=====

Expand Down
44 changes: 24 additions & 20 deletions fgutils/mapping.py → fgutils/algorithm/subgraph.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,26 @@

from fgutils.permutation import PermutationMapper

from fgutils.const import SYMBOL_KEY, BOND_KEY


def _get_neighbors(graph, idx, excluded_nodes=set()):
return [
(nidx, graph.nodes[nidx]["symbol"])
(nidx, graph.nodes[nidx][SYMBOL_KEY])
for nidx in graph.neighbors(idx)
if nidx not in excluded_nodes
]


def _get_symbol(graph, idx):
return graph.nodes[idx]["symbol"]
return graph.nodes[idx][SYMBOL_KEY]


def map_anchored_pattern(
def map_anchored_subgraph(
graph: nx.Graph,
anchor: int,
pattern: nx.Graph,
pattern_anchor: int,
subgraph: nx.Graph,
subgraph_anchor: int,
mapper: PermutationMapper,
):
def _fit(idx, pidx, visited_nodes=set(), visited_pnodes=set(), indent=0):
Expand All @@ -30,7 +32,7 @@ def _fit(idx, pidx, visited_nodes=set(), visited_pnodes=set(), indent=0):
visited_pnodes.add(pidx)

node_neighbors = _get_neighbors(graph, idx, visited_nodes)
pnode_neighbors = _get_neighbors(pattern, pidx, visited_pnodes)
pnode_neighbors = _get_neighbors(subgraph, pidx, visited_pnodes)

nn_syms = [n[1] for n in node_neighbors]
pnn_syms = [n[1] for n in pnode_neighbors]
Expand All @@ -50,9 +52,9 @@ def _fit(idx, pidx, visited_nodes=set(), visited_pnodes=set(), indent=0):
if nn_i == -1:
_vpnodes.add(pnn_idx)
continue
pnn_bond = pattern.edges[pidx, pnn_idx]["bond"]
pnn_bond = subgraph.edges[pidx, pnn_idx][BOND_KEY]
nn_idx = node_neighbors[nn_i][0]
nn_bond = graph.edges[idx, nn_idx]["bond"]
nn_bond = graph.edges[idx, nn_idx][BOND_KEY]
if nn_bond == pnn_bond:
r_fit, r_mapping, r_vnodes = _fit(
nn_idx,
Expand Down Expand Up @@ -83,39 +85,41 @@ def _fit(idx, pidx, visited_nodes=set(), visited_pnodes=set(), indent=0):
fit = False
mapping = []
sym = _get_symbol(graph, anchor)
psym = _get_symbol(pattern, pattern_anchor)
psym = _get_symbol(subgraph, subgraph_anchor)
init_mapping = mapper.permute([psym], [sym])
if init_mapping == [[(0, 0)]]:
fit, mapping, _ = _fit(anchor, pattern_anchor)
fit, mapping, _ = _fit(anchor, subgraph_anchor)

return fit, mapping


def map_pattern(
def map_subgraph(
graph: nx.Graph,
anchor: int,
pattern: nx.Graph,
subgraph: nx.Graph,
mapper: PermutationMapper,
pattern_anchor: None | int = None,
subgraph_anchor: None | int = None,
):
if pattern_anchor is None:
if len(pattern) == 0:
if subgraph_anchor is None:
if len(subgraph) == 0:
return [(True, [])]
results = []
for pidx in pattern.nodes:
result = map_anchored_pattern(graph, anchor, pattern, pidx, mapper)
for pidx in subgraph.nodes:
result = map_anchored_subgraph(graph, anchor, subgraph, pidx, mapper)
results.append(result)
if len(results) > 0:
return results
else:
return [(False, [])]
else:
return [map_anchored_pattern(graph, anchor, pattern, pattern_anchor, mapper)]
return [map_anchored_subgraph(graph, anchor, subgraph, subgraph_anchor, mapper)]


def map_to_entire_graph(graph: nx.Graph, pattern: nx.Graph, mapper: PermutationMapper):
def map_subgraph_to_graph(
graph: nx.Graph, subgraph: nx.Graph, mapper: PermutationMapper
):
for i in range(len(graph)):
mappings = map_pattern(graph, i, pattern, mapper)
mappings = map_subgraph(graph, i, subgraph, mapper)
for r, _ in mappings:
if r is True:
return True
Expand Down
36 changes: 19 additions & 17 deletions fgutils/chem/its.py
Original file line number Diff line number Diff line change
@@ -1,58 +1,60 @@
import collections
import networkx as nx

from fgutils.const import SYMBOL_KEY, AAM_KEY, BOND_KEY
from fgutils.const import SYMBOL_KEY, AAM_KEY, BOND_KEY, IDX_MAP_KEY


def _add_its_nodes(ITS, G, H, eta, symbol_key):
def _add_its_nodes(ITS, G, H, eta):
eta_G, eta_G_inv, eta_H, eta_H_inv = eta[0], eta[1], eta[2], eta[3]
for n, d in G.nodes(data=True):
n_ITS = eta_G[n]
n_H = eta_H_inv[n_ITS]
if n_ITS is not None and n_H is not None:
ITS.add_node(n_ITS, symbol=d[symbol_key], idx_map=(n, n_H))
node_attributes = {SYMBOL_KEY: d[SYMBOL_KEY], IDX_MAP_KEY: (n, n_H)}
ITS.add_node(n_ITS, **node_attributes)
for n, d in H.nodes(data=True):
n_ITS = eta_H[n]
n_G = eta_G_inv[n_ITS]
if n_ITS is not None and n_G is not None and n_ITS not in ITS.nodes:
ITS.add_node(n_ITS, symbol=d[symbol_key], idx_map=(n_G, n))
node_attributes = {SYMBOL_KEY: d[SYMBOL_KEY], IDX_MAP_KEY: (n_G, n)}
ITS.add_node(n_ITS, **node_attributes)


def _add_its_edges(ITS, G, H, eta, bond_key):
def _add_its_edges(ITS, G, H, eta):
eta_G, eta_G_inv, eta_H, eta_H_inv = eta[0], eta[1], eta[2], eta[3]
for n1, n2, d in G.edges(data=True):
if n1 > n2:
continue
e_G = d[bond_key]
e_G = d[BOND_KEY]
n_ITS1 = eta_G[n1]
n_ITS2 = eta_G[n2]
n_H1 = eta_H_inv[n_ITS1]
n_H2 = eta_H_inv[n_ITS2]
e_H = None
e_H = 0
if H.has_edge(n_H1, n_H2):
e_H = H[n_H1][n_H2][bond_key]
e_H = H[n_H1][n_H2][BOND_KEY]
if not ITS.has_edge(n_ITS1, n_ITS2) and n_ITS1 > 0 and n_ITS2 > 0:
ITS.add_edge(n_ITS1, n_ITS2, bond=(e_G, e_H))
edge_attributes = {BOND_KEY: (e_G, e_H)}
ITS.add_edge(n_ITS1, n_ITS2, **edge_attributes)

for n1, n2, d in H.edges(data=True):
if n1 > n2:
continue
e_H = d[bond_key]
e_H = d[BOND_KEY]
n_ITS1 = eta_H[n1]
n_ITS2 = eta_H[n2]
n_G1 = eta_G_inv[n_ITS1]
n_G2 = eta_G_inv[n_ITS2]
if n_G1 is None or n_G2 is None:
continue
if not G.has_edge(n_G1, n_G2) and n_ITS1 > 0 and n_ITS2 > 0:
ITS.add_edge(n_ITS1, n_ITS2, bond=(None, e_H))
edge_attributes = {BOND_KEY: (0, e_H)}
ITS.add_edge(n_ITS1, n_ITS2, **edge_attributes)


def get_its(G: nx.Graph, H: nx.Graph) -> nx.Graph:
"""
Get the ITS graph of reaction G \u2192 H. G and H must be molecular graphs
with node labels 'aam' and 'symbol' and bond label 'bond'.
"""Get the ITS graph of reaction G \u2192 H. G and H must be molecular
graphs with node labels 'aam' and 'symbol' and bond label 'bond'.
:param G: Reactant molecular graph.
:param H: Product molecular graph.
Expand All @@ -79,7 +81,7 @@ def get_its(G: nx.Graph, H: nx.Graph) -> nx.Graph:
eta_H_inv[d[AAM_KEY]] = n

ITS = nx.Graph()
_add_its_nodes(ITS, G, H, eta, SYMBOL_KEY)
_add_its_edges(ITS, G, H, eta, BOND_KEY)
_add_its_nodes(ITS, G, H, eta)
_add_its_edges(ITS, G, H, eta)

return ITS
42 changes: 42 additions & 0 deletions fgutils/chem/ps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
atomic_sym2num = {
"H": 1,
"He": 2,
"Li": 3,
"Be": 4,
"B": 5,
"C": 6,
"N": 7,
"O": 8,
"F": 9,
"Ne": 10,
"Na": 11,
"Mg": 12,
"Al": 13,
"Si": 14,
"P": 15,
"S": 16,
"Cl": 17,
"Ar": 18,
"K": 19,
"Ca": 20,
"Ga": 31,
"Ge": 32,
"As": 33,
"Se": 34,
"Br": 35,
"Kr": 36,
"Rb": 37,
"Sr": 38,
"In": 49,
"Sn": 50,
"Sb": 51,
"Te": 52,
"I": 53,
"Xe": 54,
}


def get_atomic_number(atomic_symbol) -> int:
if atomic_symbol not in atomic_sym2num.keys():
raise ValueError("Unknown atomic symbol '{}'.".format(atomic_symbol))
return atomic_sym2num[atomic_symbol]
1 change: 1 addition & 0 deletions fgutils/const.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
BOND_KEY = "bond"
IS_LABELED_KEY = "is_labeled"
LABELS_KEY = "labels"
IDX_MAP_KEY = "idx_map"
Loading

0 comments on commit e44c3c6

Please sign in to comment.