-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomtype parameterization methods for parameterizing a topology in GMSO. #644
Changes from 12 commits
8e69d6e
9074909
80c1850
6376da7
7d0a790
99c0793
790aacc
2d7a770
e6da3f7
40f6639
e1f3d4e
1503578
b13bf94
7be5f57
7366452
cc46491
c58aa41
0f374b2
f5633e7
d7a90b0
72137ee
263ef78
c7ea346
d267d2f
9aab528
51b1081
06df2ba
d7b08d6
031e703
4a523e6
403a36c
593c15a
341ecc5
fe352fd
b3e045f
4834b5f
0a4c780
5633c52
3711172
956977c
f0fd1e5
1e866d7
da3a23e
b018a6c
4c62c5c
318c5ad
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
"""GMSO functions that generate parameterized Topologies.""" | ||
from .parameterize import apply |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
"""TopologyGraph Functions that identify molecules from isomorphism.""" | ||
from collections import deque | ||
|
||
import networkx as nx | ||
|
||
|
||
def partition_isomorphic_topology_graphs(graph): | ||
"""Return a collection of isomorphic sets of the subgraphs of the Topology Graph.. | ||
|
||
Parameters | ||
---------- | ||
graph: foyer.topology_graph.TopologyGraph | ||
The networkx subclassed TopologyGraph with data identifying the nodes | ||
and edges that make up a topology atom and bonds structure | ||
|
||
Returns | ||
------- | ||
isomorphic_elements: dict | ||
The keys are unique disconnected graphs, and the values are all | ||
identical subgraphs in the graph | ||
|
||
Notes | ||
----- | ||
See https://github.com/networkx/networkx/blob/main/networkx/algorithms/isomorphism/isomorphvf2.py | ||
from the networkx documentation about identifying isomorphic components | ||
""" | ||
|
||
subgraphs_gen = (graph.subgraph(c) for c in nx.connected_components(graph)) | ||
subgraphs_list = list(subgraphs_gen) | ||
|
||
graph_queue = deque(subgraphs_list) | ||
graph_of_interest = graph_queue.popleft() | ||
isomorphic_elements = { | ||
graph_of_interest: [], | ||
} | ||
|
||
last_element_popped = None | ||
count = 0 | ||
first_mismatch = None | ||
while graph_queue: | ||
if graph_queue[0] == first_mismatch: | ||
count = 0 | ||
graph_of_interest = graph_queue.popleft() | ||
isomorphic_elements[graph_of_interest] = [] | ||
if graph_queue: | ||
graph = graph_queue.popleft() | ||
matcher = nx.algorithms.isomorphism.GraphMatcher( | ||
graph, graph_of_interest | ||
umesh-timalsina marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) | ||
if matcher.is_isomorphic(): | ||
isomorphic_elements[graph_of_interest].append( | ||
(graph, matcher.mapping) | ||
) | ||
else: | ||
if count == 0: | ||
first_mismatch = graph | ||
graph_queue.append(graph) | ||
count += 1 | ||
return isomorphic_elements |
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,284 @@ | ||||||||||
"""Functions used to atomtype a gmso.Topology.""" | ||||||||||
from operator import attrgetter | ||||||||||
|
||||||||||
import networkx as nx | ||||||||||
from foyer.atomtyper import AtomTypingRulesProvider, find_atomtypes | ||||||||||
from foyer.smarts import SMARTS | ||||||||||
from foyer.topology_graph import TopologyGraph | ||||||||||
|
||||||||||
from gmso import Atom | ||||||||||
from gmso.atomtyping.isomorph import ( # check refs | ||||||||||
partition_isomorphic_topology_graphs, | ||||||||||
) | ||||||||||
|
||||||||||
__all__ = ["apply"] | ||||||||||
|
||||||||||
|
||||||||||
def apply( | ||||||||||
top, | ||||||||||
forcefields, | ||||||||||
identify_connected_components=True, | ||||||||||
use_residue_info=False, | ||||||||||
assert_bond_params=True, | ||||||||||
assert_angle_params=True, | ||||||||||
assert_dihedral_params=True, | ||||||||||
assert_improper_params=False, | ||||||||||
): | ||||||||||
"""Set Topology parameter types from GMSO Forcefields. | ||||||||||
|
||||||||||
Parameters | ||||||||||
---------- | ||||||||||
top: gmso.core.topology.Topology, required | ||||||||||
The GMSO topology on which to apply forcefields | ||||||||||
forcefields: dict, required | ||||||||||
The keys are labels that match the subtopology label or site residue_name, and the | ||||||||||
values are gmso Forcefield objects that gets applied to the specified molecule | ||||||||||
identify_connected_components: bool, optional, default=True | ||||||||||
A flag to determine whether or not to search the topology for repeated disconnected | ||||||||||
structures, otherwise known as molecules and type each molecule only once. | ||||||||||
use_residue_info: bool, optional, default=False | ||||||||||
A flag to determine whether or not to look at site.residue_name to look parameterize | ||||||||||
each molecule only once. Will only be used if identify_connected_components=False | ||||||||||
assert_bond_params : bool, optional, default=True | ||||||||||
If True, an error is raised if parameters are not found for all system | ||||||||||
bonds. | ||||||||||
assert_angle_params : bool, optional, default=True | ||||||||||
If True, an error is raised if parameters are not found for all system | ||||||||||
angles. | ||||||||||
assert_dihedral_params : bool, optional, default=True | ||||||||||
If True, an error is raised if parameters are not found for all system | ||||||||||
proper dihedrals. | ||||||||||
assert_improper_params : bool, optional, default=False | ||||||||||
If True, an error is raised if parameters are not found for all system | ||||||||||
improper dihedrals. | ||||||||||
""" | ||||||||||
top.identify_connections() | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any thoughts on adding a flag for this operation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So only identify connections if those connections were missing? Makes sense to me, do you want it default flagged on or off? I do start to get worried about the number of flags that are starting to get passed around. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we will need a flag for this operation if we want to keep it here. But, I am thinking, should we ask the user to perform the
umesh-timalsina marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
top_graph = get_modified_topology_graph(top) | ||||||||||
if identify_connected_components: # True, (True,False) | ||||||||||
# populate molecule information | ||||||||||
isomorphic_top_graphs = partition_isomorphic_topology_graphs(top_graph) | ||||||||||
apply_atomtypes_from_isomorphic( | ||||||||||
top, isomorphic_top_graphs, forcefields | ||||||||||
) # what if only one forcefield?, traversing done in this function | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you elaborate on the question posed in this comment? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This question was a note to myself to handle the case that only one forcefield is passed, without specifying the molecules it should be attached to. Currently, that will just spit out an error since it expects
umesh-timalsina marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
elif not use_residue_info: # False, False | ||||||||||
# get entire topgraph | ||||||||||
topgraph = get_modified_topology_graph(top) | ||||||||||
# get rules provider from gmso_ff | ||||||||||
# Assumes forcefields is just one forcefield | ||||||||||
ff_of_interest = next(iter(forcefields.values())) | ||||||||||
at_rules_provider = get_atomtyping_rules_provider( | ||||||||||
gmso_ff=ff_of_interest | ||||||||||
) | ||||||||||
# create a typemap of that topgraph | ||||||||||
typemap = find_atomtypes(topgraph, at_rules_provider) | ||||||||||
# apply typemap | ||||||||||
traverse_typemap(top, typemap, ff_of_interest) | ||||||||||
elif use_residue_map: # False, True | ||||||||||
"""Not Implemented | ||||||||||
maps = {} | ||||||||||
for key, val in forcefields.items(): | ||||||||||
# generate topgraph of items with "key" residue name | ||||||||||
# NOTE CAL: this iter_sites_by method won't work currently, can only iter one at a time. | ||||||||||
res_iter = top.iter_sites_by(["residue_name", "residue_number"], [key, 1]) #Need to note that resnumber index 1 | ||||||||||
# create a top/graph from this set of sites | ||||||||||
#subgraph = FUNCTION(res_iter) | ||||||||||
# append that typemap to a dict with residue name as key | ||||||||||
#maps[key] = find_atomtypes(subgraph, val) | ||||||||||
# generate a full typemap by iterating through all sites and copying over map info from `maps` | ||||||||||
#typemap = FUNCTION(top, maps) | ||||||||||
# apply typemap | ||||||||||
traverse_typemap(top, typemap, forcefields) | ||||||||||
""" | ||||||||||
raise ( | ||||||||||
GMSOError, | ||||||||||
"Using site residue matching to match substructures to a given forcefield is not implemented yet", | ||||||||||
) | ||||||||||
|
||||||||||
if assert_bond_params: | ||||||||||
apply_connection_types(top, forcefields, "bond") | ||||||||||
if assert_angle_params: | ||||||||||
apply_connection_types(top, forcefields, "angle") | ||||||||||
if assert_dihedral_params: | ||||||||||
apply_connection_types(top, forcefields, "dihedral") | ||||||||||
if assert_improper_params: | ||||||||||
apply_connection_types(top, forcefields, "improper") | ||||||||||
|
||||||||||
return top | ||||||||||
|
||||||||||
|
||||||||||
def get_modified_topology_graph(gmso_topology): | ||||||||||
"""Return a TopologyGraph with relevant attributes from an GMSO topology. | ||||||||||
|
||||||||||
Parameters | ||||||||||
---------- | ||||||||||
gmso_topology: gmso.Topology | ||||||||||
The GMSO Topology | ||||||||||
|
||||||||||
Returns | ||||||||||
------- | ||||||||||
TopologyGraph | ||||||||||
The equivalent TopologyGraph of the openFF Topology `openff_topology` | ||||||||||
""" | ||||||||||
top_graph = TopologyGraph() | ||||||||||
for atom in gmso_topology.sites: | ||||||||||
if isinstance(atom, Atom): | ||||||||||
if atom.name.startswith("_"): | ||||||||||
top_graph.add_atom( | ||||||||||
name=atom.name, | ||||||||||
index=gmso_topology.get_index(atom), | ||||||||||
atomic_number=None, | ||||||||||
element=atom.name, | ||||||||||
subtopology_label=atom.label, # TODO: Change based on PR #638 conventions | ||||||||||
) | ||||||||||
|
||||||||||
else: | ||||||||||
top_graph.add_atom( | ||||||||||
name=atom.name, | ||||||||||
index=gmso_topology.get_index(atom), | ||||||||||
atomic_number=atom.element.atomic_number, | ||||||||||
element=atom.element.symbol, | ||||||||||
subtopology_label=atom.label, # TODO: Change based on PR #638 conventions | ||||||||||
) | ||||||||||
|
||||||||||
for top_bond in gmso_topology.bonds: | ||||||||||
atoms_indices = [ | ||||||||||
gmso_topology.get_index(atom) | ||||||||||
for atom in top_bond.connection_members | ||||||||||
] | ||||||||||
top_graph.add_bond(atoms_indices[0], atoms_indices[1]) | ||||||||||
|
||||||||||
return top_graph | ||||||||||
|
||||||||||
|
||||||||||
def get_atomtyping_rules_provider(gmso_ff): | ||||||||||
"""Return a foyer AtomTypingRulesProvider from a GMSO forcefield. | ||||||||||
|
||||||||||
Parameters | ||||||||||
---------- | ||||||||||
gmso_ff: gmso.core.forcefield.Forcefield | ||||||||||
The GMSO forcefield object to extract the rules from | ||||||||||
Returns | ||||||||||
------- | ||||||||||
AtomTypingRulesProvider | ||||||||||
The foyer.atomtyper.AtomTypingRulesProvider object used to parse atomtype definitions. | ||||||||||
Typically, SMARTS is the ruleset of choice. See https://github.com/mosdef-hub/foyer/issues/63 | ||||||||||
for curently supported features in Foyer. | ||||||||||
""" | ||||||||||
atom_type_defs = {} | ||||||||||
atom_type_overrides = {} | ||||||||||
parser = SMARTS({}) | ||||||||||
for atom_type_name, atom_type in gmso_ff.atom_types.items(): | ||||||||||
if atom_type.definition: | ||||||||||
atom_type_defs[atom_type_name] = atom_type.definition | ||||||||||
if atom_type.overrides: | ||||||||||
atom_type_overrides[atom_type_name] = atom_type.overrides | ||||||||||
|
||||||||||
return AtomTypingRulesProvider( | ||||||||||
atom_type_defs, atom_type_overrides, {}, parser | ||||||||||
) | ||||||||||
|
||||||||||
|
||||||||||
def apply_atomtypes_from_isomorphic(top, isomorphic_top_graphs, forcefields): | ||||||||||
"""Set Topology atomtypes from isomorphic structures matching supplied forcefields. | ||||||||||
|
||||||||||
Parameters | ||||||||||
---------- | ||||||||||
top: gmso.core.topology.Topology, required | ||||||||||
The GMSO topology on which to apply atomtypes | ||||||||||
isomorphic_top_graphs: dict, required | ||||||||||
The dictionary mapping which breaks up molecules into one graph structure, and a list of | ||||||||||
the corresponding subgraphs in the TopologyGraph that are isomorphic | ||||||||||
forcefields: dict, required | ||||||||||
The keys are labels that match the subtopology labels for each isomorphic graph, and the | ||||||||||
values are gmso Forcefield objects that gets applied to the specified molecule | ||||||||||
""" | ||||||||||
for top_graph, identical in isomorphic_top_graphs.items(): | ||||||||||
sub_top_label = next( | ||||||||||
iter( | ||||||||||
nx.classes.function.get_node_attributes( | ||||||||||
top_graph, "atom_data" | ||||||||||
).values() | ||||||||||
) | ||||||||||
).subtopology_label | ||||||||||
ff_of_interest = forcefields[sub_top_label] | ||||||||||
at_rules_provider = get_atomtyping_rules_provider( | ||||||||||
gmso_ff=ff_of_interest | ||||||||||
) | ||||||||||
sub_type_map = find_atomtypes(top_graph, at_rules_provider) | ||||||||||
traverse_typemap(top, sub_type_map, ff_of_interest) | ||||||||||
for graph, mapping in identical: | ||||||||||
for node in graph: | ||||||||||
mapped = sub_type_map[mapping[node]]["atom_type"] | ||||||||||
top._sites[node].atom_type = mapped.clone() | ||||||||||
|
||||||||||
|
||||||||||
def traverse_typemap(top, type_map, forcefield): | ||||||||||
"""Take a typemap and applies those atomtypes to the Topology. | ||||||||||
|
||||||||||
Parameters | ||||||||||
---------- | ||||||||||
top: gmso.core.topology.Topology, required | ||||||||||
The GMSO topology on which to apply atomtypes | ||||||||||
typemap: dict, required | ||||||||||
The dictionary of atom_index with iterated matching to get the atomtype of that site | ||||||||||
forcefield: gmso.forcefield.Forcefield, required | ||||||||||
The GMSO forcefield object which has information associated with each unique atomtype | ||||||||||
""" | ||||||||||
for index in type_map: | ||||||||||
site_of_interest = top._sites[index] | ||||||||||
site_of_interest.atom_type = forcefield.atom_types[ | ||||||||||
type_map[index]["atomtype"] | ||||||||||
].clone() | ||||||||||
type_map[index]["atom_type"] = site_of_interest.atom_type | ||||||||||
|
||||||||||
|
||||||||||
def apply_residue_names(top, more): # TODO | ||||||||||
"""Take a topology and applies residue info to all sites. | ||||||||||
|
||||||||||
Parameters | ||||||||||
---------- | ||||||||||
top: gmso.core.topology.Topology, required | ||||||||||
The GMSO topology on which to apply atomtypes | ||||||||||
more: TBD, required | ||||||||||
The information necessary to map residue name and number for each site | ||||||||||
""" | ||||||||||
return | ||||||||||
|
||||||||||
|
||||||||||
def apply_connection_types(top, forcefields, parameter): | ||||||||||
"""Set Topology connection types from matching atomclass information in forcefield. | ||||||||||
|
||||||||||
Parameters | ||||||||||
---------- | ||||||||||
top: gmso.core.topology.Topology, required | ||||||||||
The GMSO topology on which to apply paramter_types | ||||||||||
forcefields: dict, required | ||||||||||
The keys are labels that match the subtopology labels for each molecule, and the | ||||||||||
values are gmso Forcefield objects that gets applied to the specified molecule | ||||||||||
parameter: str, required | ||||||||||
The connection type to parameterize. Can any of "bond", "angle", "dihedral", "improper" | ||||||||||
""" | ||||||||||
connect_members = { | ||||||||||
"bond": [0, 1], | ||||||||||
"angle": [0, 1, 2], | ||||||||||
"dihedral": [0, 1, 2, 3], | ||||||||||
"improper": [0, 1, 2, 3], | ||||||||||
} # TODO validate improper order | ||||||||||
molecule_label = "label_" # place to grab info referencing molecule name | ||||||||||
for connect in getattr(top, "_" + parameter + "s"): | ||||||||||
sub_top_label = connect.connection_members[0].get(molecule_label) | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
or
Suggested change
|
||||||||||
if sub_top_label is None: | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I think this would also catch the case where the |
||||||||||
msg = f"Site {connect.connection_members[0]}in this connection has no molecule_label {molecule_label}.\ | ||||||||||
This string should correspond to one of the keys associated with the forcefield that has been \ | ||||||||||
passed. Currently provided name:GMSOForcefields pairs are {forcefields.items()}" | ||||||||||
raise ParameterizationError(msg) | ||||||||||
ff_of_interest = forcefields[sub_top_label] | ||||||||||
type_getter = attrgetter("atom_type.atomclass") | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is a None-Type check prudent here? I think foyer will raise an error if it cannot find an atomtype right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would we want to use the assert_bond_params flag to see if we should or should not look for all instances of that connection to be typed? This will definitely throw an error if it's missing as of now. Line 97 essentially handles this check as of now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added an error message in e1f3d4e.
umesh-timalsina marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
member_types = [ | ||||||||||
type_getter(getattr(connect, "connection_members")[i]) | ||||||||||
for i in connect_members[parameter] | ||||||||||
] | ||||||||||
ctype = ff_of_interest.get_potential( | ||||||||||
group=parameter + "_type", key=member_types | ||||||||||
).clone() | ||||||||||
setattr(connect, parameter + "_type", ctype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add a node_match function (based on the element of the node) IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly would a node_match function do? I'm not entirely familiar, is it something passed to the GraphMatcher to do the node matching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just tested this offline, I think the current use of GraphMatcher only concern about the shape of the graph but not about the nodes involved. So this could cause issue with molecules with same shape but different element would still be recognized as isomorphic (so lumped into the same group).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note below I am using the
get_modified_topology_graph
from the example notebook.Then
matcher.is_isomorphic()
will still returnTrue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a different note, I think we can use this
GraphMatcher
for the issue with the assumed site order in a molecule (or subtopology) (also this mosdef-hub/foyer#504). Will have to sacrifice some performance for checking but the number of checks performed will still be fewer that going the completely splitting the topology, so should still be a bit faster.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay this makes more sense to me now, thanks @daico007. I was a little hazy on the exact implementation of GraphMatcher. I think having a few different options for matching makes sense, which trade of increasing performance for increasing matching precision. This reminds me a bit of the filters used in PR #649, which can be used to specify how you want to filter unique parameter types. I think it makes sense to at least try a basic one that checks at least for elements in the correct positions, and allow that to be a built in option.