Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomtype parameterization methods for parameterizing a topology in GMSO. #644

Merged
merged 46 commits into from
Jun 15, 2022
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
8e69d6e
Atomtype parameterization methods for parameterizing a topology in GMSO.
CalCraven Apr 14, 2022
9074909
Move module imports for apply into atomtyping to prevent circular imp…
CalCraven Apr 14, 2022
80c1850
Add a quick fix which will do atomtyping if no residue flag is passed
CalCraven Apr 25, 2022
6376da7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 25, 2022
7d0a790
Merge branch 'master' of https://github.com/mosdef-hub/gmso into deve…
daico007 Apr 26, 2022
99c0793
correctly update sites when adding subtop
daico007 Apr 26, 2022
790aacc
Merge branch 'develop' into develop-gmsoff-parameterize
daico007 Apr 28, 2022
2d7a770
Merge remote-tracking branch 'upstream/develop' into develop-gmsoff-p…
umesh-timalsina Apr 28, 2022
e6da3f7
Merge remote-tracking branch 'upstream/develop' into develop-gmsoff-p…
umesh-timalsina Apr 29, 2022
40f6639
Merge remote-tracking branch 'upstream/develop' into develop-gmsoff-p…
umesh-timalsina Apr 29, 2022
e1f3d4e
Changes to doc strings for clarity. Add a subtop_label variable to ge…
CalCraven May 2, 2022
1503578
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 2, 2022
b13bf94
Merge remote-tracking branch 'upstream/develop' into develop-gmsoff-p…
umesh-timalsina May 6, 2022
7be5f57
More modular architecture
umesh-timalsina May 6, 2022
7366452
WIP- Add testing, minor refactoring of the APIs.
umesh-timalsina May 6, 2022
cc46491
WIP- Better error handling
umesh-timalsina May 9, 2022
c58aa41
Merge remote-tracking branch 'upstream/develop' into develop-gmsoff-p…
umesh-timalsina May 25, 2022
0f374b2
Merge remote-tracking branch 'upstream/develop' into develop-gmsoff-p…
umesh-timalsina May 31, 2022
f5633e7
Misc Changes
umesh-timalsina Jun 1, 2022
d7a90b0
Merge remote-tracking branch 'upstream/develop' into develop-gmsoff-p…
umesh-timalsina Jun 1, 2022
72137ee
WIP- Skip opls for now, full TrappE tests
umesh-timalsina Jun 1, 2022
263ef78
Avoid accidental overwriting of typemap when using isomorphism
umesh-timalsina Jun 3, 2022
c7ea346
WIP- Remove unused import
umesh-timalsina Jun 3, 2022
d267d2f
Use enumerate for atom index while converting to TopologyGraph
umesh-timalsina Jun 3, 2022
9aab528
Fix argument order
umesh-timalsina Jun 3, 2022
51b1081
WIP- Add test for subtopology parameterization
umesh-timalsina Jun 3, 2022
06df2ba
Make opls/trappe global fixtures, Add tests for isomorphism
umesh-timalsina Jun 6, 2022
d7b08d6
Further testing isomorphism
umesh-timalsina Jun 6, 2022
031e703
Merge remote-tracking branch 'upstream/develop' into develop-gmsoff-p…
umesh-timalsina Jun 7, 2022
4a523e6
REVERT - skip OPLS tests
umesh-timalsina Jun 7, 2022
403a36c
Merge remote-tracking branch 'upstream/develop' into develop-gmsoff-p…
umesh-timalsina Jun 7, 2022
593c15a
Merge remote-tracking branch 'upstream/develop' into develop-gmsoff-p…
umesh-timalsina Jun 7, 2022
341ecc5
Copy scaling factors and combining rules after parametrization
umesh-timalsina Jun 7, 2022
fe352fd
Proper OPLS tests
umesh-timalsina Jun 7, 2022
b3e045f
WIP- Refactor the test module
umesh-timalsina Jun 7, 2022
4834b5f
WIP- Remove unused import
umesh-timalsina Jun 7, 2022
0a4c780
WIP- Add test for parameterization with impropers
umesh-timalsina Jun 9, 2022
5633c52
Merge remote-tracking branch 'upstream/develop' into develop-gmsoff-p…
umesh-timalsina Jun 9, 2022
3711172
WIP- Additional impropers test; Separate module for testing impropers
umesh-timalsina Jun 15, 2022
956977c
Minor refacotors; additional edge cases coverage/tests
umesh-timalsina Jun 15, 2022
f0fd1e5
Docstring minor fix
umesh-timalsina Jun 15, 2022
1e866d7
Remove rel_to_module as is obsolete in forcefield_utilities
umesh-timalsina Jun 15, 2022
da3a23e
Change trappe_ua to trappe-ua for correct loading
umesh-timalsina Jun 15, 2022
b018a6c
fix typo, add note about specific use case
daico007 Jun 15, 2022
4c62c5c
Merge branch 'develop-gmsoff-parameterize' of https://github.com/CalC…
daico007 Jun 15, 2022
318c5ad
pip install forcefield-utilites until new release
daico007 Jun 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions gmso/atomtyping/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
"""GMSO functions that generate parameterized Topologies."""
from .parameterize import apply
59 changes: 59 additions & 0 deletions gmso/atomtyping/isomorph.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
"""TopologyGraph Functions that identify molecules from isomorphism."""
from collections import deque

import networkx as nx


def partition_isomorphic_topology_graphs(graph):
"""Return a collection of isomorphic sets of the subgraphs of the Topology Graph..

Parameters
----------
graph: foyer.topology_graph.TopologyGraph
The networkx subclassed TopologyGraph with data identifying the nodes
and edges that make up a topology atom and bonds structure

Returns
-------
isomorphic_elements: dict
The keys are unique disconnected graphs, and the values are all
identical subgraphs in the graph

Notes
-----
See https://github.com/networkx/networkx/blob/main/networkx/algorithms/isomorphism/isomorphvf2.py
from the networkx documentation about identifying isomorphic components
"""

subgraphs_gen = (graph.subgraph(c) for c in nx.connected_components(graph))
subgraphs_list = list(subgraphs_gen)

graph_queue = deque(subgraphs_list)
graph_of_interest = graph_queue.popleft()
isomorphic_elements = {
graph_of_interest: [],
}

last_element_popped = None
count = 0
first_mismatch = None
while graph_queue:
if graph_queue[0] == first_mismatch:
count = 0
graph_of_interest = graph_queue.popleft()
isomorphic_elements[graph_of_interest] = []
if graph_queue:
graph = graph_queue.popleft()
matcher = nx.algorithms.isomorphism.GraphMatcher(
graph, graph_of_interest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add a node_match function (based on the element of the node) IIRC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly would a node_match function do? I'm not entirely familiar, is it something passed to the GraphMatcher to do the node matching?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just tested this offline, I think the current use of GraphMatcher only concern about the shape of the graph but not about the nodes involved. So this could cause issue with molecules with same shape but different element would still be recognized as isomorphic (so lumped into the same group).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ethanol = mb.load("CCO", smiles=True)
ethanol.name = "Ethanol"

eth1 = mb.clone(ethanol)
eth2 = mb.clone(ethanol)

eth1_top = mb_convert(eth1)
eth1_top.sites[0].name = "K"
eth1_top.sites[0].element = element_by_symbol("K")
eth2_top = mb_convert(eth2)

Note below I am using the get_modified_topology_graph from the example notebook.

eth1_grp = get_modified_topology_graph(eth1_top)
eth2_grp = get_modified_topology_graph(eth2_top)
matcher = nx.isomorphism.GraphMatcher(eth1_grp, eth2_grp)

Then
matcher.is_isomorphic() will still return True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a different note, I think we can use this GraphMatcher for the issue with the assumed site order in a molecule (or subtopology) (also this mosdef-hub/foyer#504). Will have to sacrifice some performance for checking but the number of checks performed will still be fewer that going the completely splitting the topology, so should still be a bit faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay this makes more sense to me now, thanks @daico007. I was a little hazy on the exact implementation of GraphMatcher. I think having a few different options for matching makes sense, which trade of increasing performance for increasing matching precision. This reminds me a bit of the filters used in PR #649, which can be used to specify how you want to filter unique parameter types. I think it makes sense to at least try a basic one that checks at least for elements in the correct positions, and allow that to be a built in option.

umesh-timalsina marked this conversation as resolved.
Show resolved Hide resolved
)
if matcher.is_isomorphic():
isomorphic_elements[graph_of_interest].append(
(graph, matcher.mapping)
)
else:
if count == 0:
first_mismatch = graph
graph_queue.append(graph)
count += 1
return isomorphic_elements
284 changes: 284 additions & 0 deletions gmso/atomtyping/parameterize.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
"""Functions used to atomtype a gmso.Topology."""
from operator import attrgetter

import networkx as nx
from foyer.atomtyper import AtomTypingRulesProvider, find_atomtypes
from foyer.smarts import SMARTS
from foyer.topology_graph import TopologyGraph

from gmso import Atom
from gmso.atomtyping.isomorph import ( # check refs
partition_isomorphic_topology_graphs,
)

__all__ = ["apply"]


def apply(
top,
forcefields,
identify_connected_components=True,
use_residue_info=False,
assert_bond_params=True,
assert_angle_params=True,
assert_dihedral_params=True,
assert_improper_params=False,
):
"""Set Topology parameter types from GMSO Forcefields.

Parameters
----------
top: gmso.core.topology.Topology, required
The GMSO topology on which to apply forcefields
forcefields: dict, required
The keys are labels that match the subtopology label or site residue_name, and the
values are gmso Forcefield objects that gets applied to the specified molecule
identify_connected_components: bool, optional, default=True
A flag to determine whether or not to search the topology for repeated disconnected
structures, otherwise known as molecules and type each molecule only once.
use_residue_info: bool, optional, default=False
A flag to determine whether or not to look at site.residue_name to look parameterize
each molecule only once. Will only be used if identify_connected_components=False
assert_bond_params : bool, optional, default=True
If True, an error is raised if parameters are not found for all system
bonds.
assert_angle_params : bool, optional, default=True
If True, an error is raised if parameters are not found for all system
angles.
assert_dihedral_params : bool, optional, default=True
If True, an error is raised if parameters are not found for all system
proper dihedrals.
assert_improper_params : bool, optional, default=False
If True, an error is raised if parameters are not found for all system
improper dihedrals.
"""
top.identify_connections()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on adding a flag for this operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So only identify connections if those connections were missing? Makes sense to me, do you want it default flagged on or off?

I do start to get worried about the number of flags that are starting to get passed around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need a flag for this operation if we want to keep it here. But, I am thinking, should we ask the user to perform the identify_connection before arriving at this method? Since the identify_connection itself may need some adjustment (like do we want to identify improper dihedrals or not).

umesh-timalsina marked this conversation as resolved.
Show resolved Hide resolved
top_graph = get_modified_topology_graph(top)
if identify_connected_components: # True, (True,False)
# populate molecule information
isomorphic_top_graphs = partition_isomorphic_topology_graphs(top_graph)
apply_atomtypes_from_isomorphic(
top, isomorphic_top_graphs, forcefields
) # what if only one forcefield?, traversing done in this function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on the question posed in this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question was a note to myself to handle the case that only one forcefield is passed, without specifying the molecules it should be attached to. Currently, that will just spit out an error since it expects forcefields to be of type dict. So we should either tell users in this case to use identify_connected_components=False, or just automatically use the single forcefield that was passed.

umesh-timalsina marked this conversation as resolved.
Show resolved Hide resolved
elif not use_residue_info: # False, False
# get entire topgraph
topgraph = get_modified_topology_graph(top)
# get rules provider from gmso_ff
# Assumes forcefields is just one forcefield
ff_of_interest = next(iter(forcefields.values()))
at_rules_provider = get_atomtyping_rules_provider(
gmso_ff=ff_of_interest
)
# create a typemap of that topgraph
typemap = find_atomtypes(topgraph, at_rules_provider)
# apply typemap
traverse_typemap(top, typemap, ff_of_interest)
elif use_residue_map: # False, True
"""Not Implemented
maps = {}
for key, val in forcefields.items():
# generate topgraph of items with "key" residue name
# NOTE CAL: this iter_sites_by method won't work currently, can only iter one at a time.
res_iter = top.iter_sites_by(["residue_name", "residue_number"], [key, 1]) #Need to note that resnumber index 1
# create a top/graph from this set of sites
#subgraph = FUNCTION(res_iter)
# append that typemap to a dict with residue name as key
#maps[key] = find_atomtypes(subgraph, val)
# generate a full typemap by iterating through all sites and copying over map info from `maps`
#typemap = FUNCTION(top, maps)
# apply typemap
traverse_typemap(top, typemap, forcefields)
"""
raise (
GMSOError,
"Using site residue matching to match substructures to a given forcefield is not implemented yet",
)

if assert_bond_params:
apply_connection_types(top, forcefields, "bond")
if assert_angle_params:
apply_connection_types(top, forcefields, "angle")
if assert_dihedral_params:
apply_connection_types(top, forcefields, "dihedral")
if assert_improper_params:
apply_connection_types(top, forcefields, "improper")

return top


def get_modified_topology_graph(gmso_topology):
"""Return a TopologyGraph with relevant attributes from an GMSO topology.

Parameters
----------
gmso_topology: gmso.Topology
The GMSO Topology

Returns
-------
TopologyGraph
The equivalent TopologyGraph of the openFF Topology `openff_topology`
"""
top_graph = TopologyGraph()
for atom in gmso_topology.sites:
if isinstance(atom, Atom):
if atom.name.startswith("_"):
top_graph.add_atom(
name=atom.name,
index=gmso_topology.get_index(atom),
atomic_number=None,
element=atom.name,
subtopology_label=atom.label, # TODO: Change based on PR #638 conventions
)

else:
top_graph.add_atom(
name=atom.name,
index=gmso_topology.get_index(atom),
atomic_number=atom.element.atomic_number,
element=atom.element.symbol,
subtopology_label=atom.label, # TODO: Change based on PR #638 conventions
)

for top_bond in gmso_topology.bonds:
atoms_indices = [
gmso_topology.get_index(atom)
for atom in top_bond.connection_members
]
top_graph.add_bond(atoms_indices[0], atoms_indices[1])

return top_graph


def get_atomtyping_rules_provider(gmso_ff):
"""Return a foyer AtomTypingRulesProvider from a GMSO forcefield.

Parameters
----------
gmso_ff: gmso.core.forcefield.Forcefield
The GMSO forcefield object to extract the rules from
Returns
-------
AtomTypingRulesProvider
The foyer.atomtyper.AtomTypingRulesProvider object used to parse atomtype definitions.
Typically, SMARTS is the ruleset of choice. See https://github.com/mosdef-hub/foyer/issues/63
for curently supported features in Foyer.
"""
atom_type_defs = {}
atom_type_overrides = {}
parser = SMARTS({})
for atom_type_name, atom_type in gmso_ff.atom_types.items():
if atom_type.definition:
atom_type_defs[atom_type_name] = atom_type.definition
if atom_type.overrides:
atom_type_overrides[atom_type_name] = atom_type.overrides

return AtomTypingRulesProvider(
atom_type_defs, atom_type_overrides, {}, parser
)


def apply_atomtypes_from_isomorphic(top, isomorphic_top_graphs, forcefields):
"""Set Topology atomtypes from isomorphic structures matching supplied forcefields.

Parameters
----------
top: gmso.core.topology.Topology, required
The GMSO topology on which to apply atomtypes
isomorphic_top_graphs: dict, required
The dictionary mapping which breaks up molecules into one graph structure, and a list of
the corresponding subgraphs in the TopologyGraph that are isomorphic
forcefields: dict, required
The keys are labels that match the subtopology labels for each isomorphic graph, and the
values are gmso Forcefield objects that gets applied to the specified molecule
"""
for top_graph, identical in isomorphic_top_graphs.items():
sub_top_label = next(
iter(
nx.classes.function.get_node_attributes(
top_graph, "atom_data"
).values()
)
).subtopology_label
ff_of_interest = forcefields[sub_top_label]
at_rules_provider = get_atomtyping_rules_provider(
gmso_ff=ff_of_interest
)
sub_type_map = find_atomtypes(top_graph, at_rules_provider)
traverse_typemap(top, sub_type_map, ff_of_interest)
for graph, mapping in identical:
for node in graph:
mapped = sub_type_map[mapping[node]]["atom_type"]
top._sites[node].atom_type = mapped.clone()


def traverse_typemap(top, type_map, forcefield):
"""Take a typemap and applies those atomtypes to the Topology.

Parameters
----------
top: gmso.core.topology.Topology, required
The GMSO topology on which to apply atomtypes
typemap: dict, required
The dictionary of atom_index with iterated matching to get the atomtype of that site
forcefield: gmso.forcefield.Forcefield, required
The GMSO forcefield object which has information associated with each unique atomtype
"""
for index in type_map:
site_of_interest = top._sites[index]
site_of_interest.atom_type = forcefield.atom_types[
type_map[index]["atomtype"]
].clone()
type_map[index]["atom_type"] = site_of_interest.atom_type


def apply_residue_names(top, more): # TODO
"""Take a topology and applies residue info to all sites.

Parameters
----------
top: gmso.core.topology.Topology, required
The GMSO topology on which to apply atomtypes
more: TBD, required
The information necessary to map residue name and number for each site
"""
return


def apply_connection_types(top, forcefields, parameter):
"""Set Topology connection types from matching atomclass information in forcefield.

Parameters
----------
top: gmso.core.topology.Topology, required
The GMSO topology on which to apply paramter_types
forcefields: dict, required
The keys are labels that match the subtopology labels for each molecule, and the
values are gmso Forcefield objects that gets applied to the specified molecule
parameter: str, required
The connection type to parameterize. Can any of "bond", "angle", "dihedral", "improper"
"""
connect_members = {
"bond": [0, 1],
"angle": [0, 1, 2],
"dihedral": [0, 1, 2, 3],
"improper": [0, 1, 2, 3],
} # TODO validate improper order
molecule_label = "label_" # place to grab info referencing molecule name
for connect in getattr(top, "_" + parameter + "s"):
sub_top_label = connect.connection_members[0].get(molecule_label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sub_top_label = connect.connection_members[0].get(molecule_label)
sub_top_label = connect.connection_members[0].__dict__.get(molecule_label)

or

Suggested change
sub_top_label = connect.connection_members[0].get(molecule_label)
sub_top_label = connect.connection_members[0].molecule_label

if sub_top_label is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if sub_top_label is None:
if not sub_top_label:

I think this would also catch the case where the sub_top_label is empty string.

msg = f"Site {connect.connection_members[0]}in this connection has no molecule_label {molecule_label}.\
This string should correspond to one of the keys associated with the forcefield that has been \
passed. Currently provided name:GMSOForcefields pairs are {forcefields.items()}"
raise ParameterizationError(msg)
ff_of_interest = forcefields[sub_top_label]
type_getter = attrgetter("atom_type.atomclass")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a None-Type check prudent here? I think foyer will raise an error if it cannot find an atomtype right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to use the assert_bond_params flag to see if we should or should not look for all instances of that connection to be typed? This will definitely throw an error if it's missing as of now. Line 97 essentially handles this check as of now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an error message in e1f3d4e.

umesh-timalsina marked this conversation as resolved.
Show resolved Hide resolved
member_types = [
type_getter(getattr(connect, "connection_members")[i])
for i in connect_members[parameter]
]
ctype = ff_of_interest.get_potential(
group=parameter + "_type", key=member_types
).clone()
setattr(connect, parameter + "_type", ctype)
2 changes: 1 addition & 1 deletion gmso/core/topology.py
Original file line number Diff line number Diff line change
Expand Up @@ -624,7 +624,7 @@ def add_subtopology(self, subtop, update=True):
"""
self._subtops.add(subtop)
subtop.parent = self
self._sites.union(subtop.sites)
self._sites = self._sites.union(subtop.sites)
if update:
self.update_topology()

Expand Down