The molecular graph library.
The focus of mogli is finding maximal common fragments. Maximal common fragments are common subgraphs of two molecular graphs, where not just atom pairs are matched but also their neighborhood. The size of neighborhood is determined by the shell size k - the maximal distance (number of bonds) from the central atom of the neighborhood. The core atoms of a fragment are the atoms more than k bonds away from the fragment border, all other atoms are shell atoms.
mogli provides both a C++ and a Python API and is using the following awesome third party libraries:
- lemon graph library
- nauty graph isomorphism solver
- LAD subgraph isomorphism solver
- boost::dynamic_bitset
- msgpack-c
- pybind11
- Catch2
Build shared library libcmogli.so
:
mkdir build
cd build
cmake .. -DBUILD_PYTHON=OFF
cmake --build cmogli
Installation is easily done with:
python setup.py install
mogli's unit tests cover all basic functions. Usually it is not necessary to run the tests, unless you changed the code.
Compile the unit tests with:
mkdir build_test
cd build_test
cmake .. -DBUILD_TESTS=ON
cmake --build unit_tests
Then, run all unit tests:
ctest -V
Running only the C++ tests:
ctest -V -L cpp
or the Python tests:
ctest -V -L python
mogli imports and exports molecules from and to the lemon graph format (LGF). The format is table-based and a standard mogli LGF file looks like this:
@nodes
partial_charge label label2 atomType coordX coordY coordZ initColor
-0.048 0 C1 12 -0.765 -0.000 0.000 1
0.016 1 H3 20 -1.164 -0.813 0.619 2
0.016 2 H4 20 -1.164 -0.129 -1.013 3
0.016 3 H5 20 -1.164 0.942 0.394 4
@edges
label
0 1 0
0 2 1
0 3 2
The atomType are the IACM atom types used by the ATB database.
A simple example of how to work with the molecule class:
#include <molecule.h>
using namespace mogli;
...
// create an empty molecule
Molecule mol;
// read molecule data from standard lgf
std::ifstream ifs("./data/min_1.lgf", std::ifstream::in);
mol.read_lgf_stream(ifs);
ifs.close();
// iterate over all atoms and print the properties
for (NodeIt n = mol.get_node_iter(); n != lemon::INVALID; ++n) {
std::cout << mol.get_element(n) << " "
<< mol.get_id(n) << " "
<< std::get<std::string>(mol.get_property(n, "label2")) << " "
<< std::get<double>(mol.get_property(n, "partial_charge")) << std::endl;
}
Check if the molecular graph is connected:
// is the graph connected?
std::cout << mol.is_connected() << std::endl;
Iterating over all neighbors of an atom:
// iterate neighbors w of node v
for (IncEdgeIt e = mol.get_inc_edge_iter(v); e != lemon::INVALID; ++e) {
Node w = mol.get_opposite_node(v, e);
}
Importing custom LGF files is also supported. For example, this LGF file has an atom ID column "id", an atom type column "element" and bool and int atom property columns.
@nodes
id element bool int
0 0 0 1
1 0 1 2
...
The custom LGF file can be imported with:
LGFIOConfig config("id", "element");
config.add_bool_node_prop("bool").add_int_node_prop("int")
std::ifstream ifs("./data/min_2.lgf", std::ifstream::in);
mol.read_lgf_stream(ifs, config);
ifs.close();
By default, mogli uses the IACM atom types used by the ATB database. The atom types are managed by the PeriodicTable class. To create a custom periodic table, it is possible to create a new periodic table from scratch, change the default periodic table or copy and then change the default periodic table:
#include <periodictable.h>
...
// copy the default IACM periodic table
PeriodicTable table(PeriodicTable::get_default());
// add a new element "Foo"
table.add_uncolored(132, "Foo");
// make "H" and "HC" equivalent
table.make_equivalent({20, 21});
// create an empty molecule with this custom periodic table
Molecule mol(table);
Maximal common fragments are common subgraphs of two molecular graphs, where not just atom pairs are matched but also their neighborhood. The size of neighborhood is determined by the shell size.
Enumerating all maximal common fragments:
#include <mcf.h>
...
FragmentVector fragments;
MatchVector matches1, matches2;
int shell = 1;
int timeout = 10;
// t: False, if timeout occured, true otherwise.
bool t = maximal_common_fragments(mol1, // Matched against mol2 to find maximal common fragments.
mol2, // Matched against mol1 to find maximal common fragments.
fragments, // Vector of resulting fragments.
matches1, // Resulting mappings from fragment atom IDs to atom IDs in mol1.
matches2, // Resulting mappings from fragment atom IDs to atom IDs in mol2.
shell, // Shell size.
timeout); // Timeout in seconds.
To only retrieve the largest common fragments, set the maximum parameter of maximal_common_fragments
to true
.
A simple example of how to work with the molecule class:
from mogli import Molecule
# create an empty molecule
mol = Molecule()
# read molecule data from standard lgf
with open('./data/min_1.lgf') as f:
mol.read_lgf(f.read())
# iterate over all atoms and print the properties
for n in mol.get_node_iter():
print(mol.get_element(n),
mol.get_id(n),
mol.get_property(n, 'label2'),
mol.get_property(n, 'partial_charge'))
Check if the molecular graph is connected:
# is the graph connected?
print(mol.is_connected())
Iterating over all neighbors of an atom:
# iterate neighbors w of atom v
for e in mol.get_inc_edge_iter(v):
w = mol.get_opposite_node(v, e)
Importing custom LGF files is also supported. For example, this LGF file has an atom ID column "id", an atom type column "element" and bool and int atom property columns.
@nodes
id element bool int
0 0 0 1
1 0 1 2
...
The custom LGF file can be imported with:
from mogli import LGFIOConfig
...
config = LGFIOConfig('id', 'element')
config.add_bool_node_prop('bool').add_int_node_prop('int')
with open('./data/min_2.lgf') as f:
mol.read_lgf(f.read(), config)
By default, mogli uses the IACM atom types used by the ATB database. The atom types are managed by the PeriodicTable class. To create a custom periodic table, it is possible to create a new periodic table from scratch, change the default periodic table or copy and then change the default periodic table:
from mogli import PeriodicTable
...
# copy the default IACM periodic table
table = PeriodicTable(PeriodicTable.get_default())
# add a new element "Foo"
table.add_uncolored(132, 'Foo')
# make "H" and "HC" equivalent
table.make_equivalent(20, 21)
# create an empty molecule with this custom periodic table
mol = Molecule(table)
Maximal common fragments are common subgraphs of two molecular graphs, where not just atom pairs are matched but also their neighborhood. The size of neighborhood is determined by the shell size.
Enumerating all maximal common fragments:
from mogli import maximal_common_fragments
...
shell, timeout = 1, 10
# t: False, if timeout occured, true otherwise.
# fragments: Vector of resulting fragments.
# matches1: Resulting mappings from fragment atom IDs to atom IDs in mol1.
# matches2: Resulting mappings from fragment atom IDs to atom IDs in mol2.
t, fragments, matches1, matches2 = maximal_common_fragments(mol1, # Matched against mol2 to find maximal common fragments.
mol2, # Matched against mol1 to find maximal common fragments.
shell, # Shell size.
timeout) # Timeout in seconds.
To only retrieve the largest common fragments, set the maximum parameter of maximal_common_fragments
to true
.
If you get this error:
fatal error: Python.h: No such file or directory
You need to install python-devel (Fedora) or python-dev (Ubuntu).