-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomtype parameterization methods for parameterizing a topology in GMSO. #644
Atomtype parameterization methods for parameterizing a topology in GMSO. #644
Conversation
This PR will add the atomtyping module to GMSO for passing a GMSO.Forcefield and a GMSO.Topology and match atomtype using foyer as the backend. Then, the corresponding connection types will be found in the Forcefield and applied to the connections in the topology. * Create parameterize.py, which has the apply function which can take a topology, and a gmso forcefield to apply to it. This can use subgraph isomorphism to identify molecular structures in the topology through the bondgraph and bin those into unique molecules that match a specified forcefield. This apply function can also do the standard atomtyping of the entire topology in one step. * Create isomorph.py which uses networkx graphs to identify disconnected components and isomorphism to identify repeated structures.
This pull request introduces 2 alerts when merging 8e69d6e into efeaad2 - view on LGTM.com new alerts:
|
This pull request introduces 2 alerts when merging 9074909 into efeaad2 - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is shaping up nicely @CalCraven. I have added some comments by skimming through it.
gmso/atomtyping/isomorph.py
Outdated
matcher = nx.algorithms.isomorphism.GraphMatcher( | ||
graph, graph_of_interest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add a node_match function (based on the element of the node) IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly would a node_match function do? I'm not entirely familiar, is it something passed to the GraphMatcher to do the node matching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just tested this offline, I think the current use of GraphMatcher only concern about the shape of the graph but not about the nodes involved. So this could cause issue with molecules with same shape but different element would still be recognized as isomorphic (so lumped into the same group).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ethanol = mb.load("CCO", smiles=True)
ethanol.name = "Ethanol"
eth1 = mb.clone(ethanol)
eth2 = mb.clone(ethanol)
eth1_top = mb_convert(eth1)
eth1_top.sites[0].name = "K"
eth1_top.sites[0].element = element_by_symbol("K")
eth2_top = mb_convert(eth2)
Note below I am using the get_modified_topology_graph
from the example notebook.
eth1_grp = get_modified_topology_graph(eth1_top)
eth2_grp = get_modified_topology_graph(eth2_top)
matcher = nx.isomorphism.GraphMatcher(eth1_grp, eth2_grp)
Then
matcher.is_isomorphic()
will still return True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a different note, I think we can use this GraphMatcher
for the issue with the assumed site order in a molecule (or subtopology) (also this mosdef-hub/foyer#504). Will have to sacrifice some performance for checking but the number of checks performed will still be fewer that going the completely splitting the topology, so should still be a bit faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay this makes more sense to me now, thanks @daico007. I was a little hazy on the exact implementation of GraphMatcher. I think having a few different options for matching makes sense, which trade of increasing performance for increasing matching precision. This reminds me a bit of the filters used in PR #649, which can be used to specify how you want to filter unique parameter types. I think it makes sense to at least try a basic one that checks at least for elements in the correct positions, and allow that to be a built in option.
gmso/atomtyping/parameterize.py
Outdated
for connect in getattr(top, "_" + parameter + "s"): | ||
sub_top_label = connect.connection_members[0].label_ | ||
ff_of_interest = forcefields[sub_top_label] | ||
type_getter = attrgetter("atom_type.atomclass") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a None-Type check prudent here? I think foyer will raise an error if it cannot find an atomtype right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we want to use the assert_bond_params flag to see if we should or should not look for all instances of that connection to be typed? This will definitely throw an error if it's missing as of now. Line 97 essentially handles this check as of now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an error message in e1f3d4e.
gmso/atomtyping/parameterize.py
Outdated
index=gmso_topology.get_index(atom), | ||
atomic_number=None, | ||
element=atom.name, | ||
subtopology_label=atom.label, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure where we want to address this but we need a better identifier than atom.label
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I will think about this, and see if there is something better we can be using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not the atom be the atom number in the topology?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index is the atom_number in the topology. The subtopology_label should be the string that we use to apply a given forcefield to the sites with this label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to PR #638, this info will be stored in site.molecule. I think whatever convention we decide to use there can be the reference label we use to identify the molecule name, and therefore the identifier we use for the forcefield
This pull request introduces 2 alerts when merging 6376da7 into f9e7332 - view on LGTM.com new alerts:
|
…lop-gmsoff-parameterize
Codecov Report
@@ Coverage Diff @@
## develop #644 +/- ##
===========================================
+ Coverage 91.10% 91.70% +0.59%
===========================================
Files 57 64 +7
Lines 4758 4964 +206
===========================================
+ Hits 4335 4552 +217
+ Misses 423 412 -11
Continue to review full report at Codecov.
|
This pull request introduces 2 alerts when merging 790aacc into cd68411 - view on LGTM.com new alerts:
|
This pull request introduces 2 alerts when merging 2d7a770 into 1818e46 - view on LGTM.com new alerts:
|
This pull request introduces 2 alerts when merging 40f6639 into f9c40ea - view on LGTM.com new alerts:
|
…neralize where the molecule definition is pulled from
for more information, see https://pre-commit.ci
This pull request introduces 2 alerts when merging 1503578 into f9c40ea - view on LGTM.com new alerts:
|
gmso/atomtyping/parameterize.py
Outdated
} # TODO validate improper order | ||
molecule_label = "label_" # place to grab info referencing molecule name | ||
for connect in getattr(top, "_" + parameter + "s"): | ||
sub_top_label = connect.connection_members[0].get(molecule_label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sub_top_label = connect.connection_members[0].get(molecule_label) | |
sub_top_label = connect.connection_members[0].__dict__.get(molecule_label) |
or
sub_top_label = connect.connection_members[0].get(molecule_label) | |
sub_top_label = connect.connection_members[0].molecule_label |
gmso/atomtyping/parameterize.py
Outdated
molecule_label = "label_" # place to grab info referencing molecule name | ||
for connect in getattr(top, "_" + parameter + "s"): | ||
sub_top_label = connect.connection_members[0].get(molecule_label) | ||
if sub_top_label is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if sub_top_label is None: | |
if not sub_top_label: |
I think this would also catch the case where the sub_top_label
is empty string.
Rename module to to parameterization, not atomtyping |
This pull request introduces 1 alert when merging 7be5f57 into 2605c65 - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging 7366452 into 2605c65 - view on LGTM.com new alerts:
|
* Include ImproperType reading in "from_parmed" This PR is branched from PR #644. The parmed conversion to GMSO topology was not able to read in various Improper connections to GMSO. It would aggregate them with the Dihedral connections, from when GMSO did not have explicit treatment of impropers, similar to ParmEd. This PR aims to check a ParmEd object for structure.impropers or structure.dihedrals with the dihedral.impropers flag set to True and convert those to `gmso.Impropers`. Likewise, it looks to read in periodic or harmonic impropers and generate the correct `gmso.ImproperType` parametric potential for that type, if the flag `refer_types` is set to True in the `from_parmed` call. * Add pmd_improper_types_map function to convert_parmed * Generate a parametric potential with harmonic type expression from pmd.impropers * Generate a parametric potential with periodic type expression from pmd.dihedrals and dihedral.improper=True * Validate testing with foyer opls_validation when using functions in PR #644 Harmonic type expression for impropers is taken from https://manual.gromacs.org/current/reference-manual/functions/bonded-interactions.html#improper-dihedrals-harmonic-type This uses Xi as the independent variable. If someone testing this PR can validate that this is the formalism we want to use when reading in ImproperTypes from ParmEd improper_types, that would be a good sanity check. The periodic type expression is taken from https://manual.gromacs.org/current/reference-manual/functions/bonded-interactions.html#equation-eqnperiodicpropdihedral with phi as the independent variable. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add test for number of impropers from NNdimethylformamide molecule * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP- include gmso in impropertype testing * WIP- Add support for NoneTypes in site charge/mass * Address Review comments and create impropertype expression from HarmonicImproperPotential in library * WIP- Add testing for impropers * Modification for improper potential forms from the PotentialTemplateLibrary * WIP- Use id instead of object for storing mapping; impropertypes test * WIP- combined test for dihedral/impropertypes * Fix test case and id mapping in conversion * Add tests for harmonic impropers * Add proper properties for improper_types * WIP-Additional test cases Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Umesh Timalsina <umesh.timalsina@vanderbilt.edu>
One edge case popped up when @bc118 was testing on this PR: when a topology with no subtopology (so only two layers topology and sites) is being typed, the apply can only take one single forcefield XML file, and not in the form of a dictionary ( During that discussion, another question raised, do all forcefield provided in the forcefield XML dictionary has to be used or do we allow users to supercharge and provide excess forcefields. One idea is to not allow it by default but only when |
…raven/gmso into develop-gmsoff-parameterize
* Modify scaling factors to a numpy array * Misc Changes 1. Split property into two parts. global_scaling_factors and per_molecule_scaling_factors 2. Handle Nan/NoneType cases, non float cases 3. Add additional functions for handling scaling factors 4. Change formats/json.py to incorporate new properties 5. Change function name to prefix `get_`, formats/mcf.py change * WIP- Refactor global_scaling_factors to scaling_factors * Additional tests; Simplificaiton of get_scaling_factors method * Atomtype parameterization methods for parameterizing a topology in GMSO. (#644) * Atomtype parameterization methods for parameterizing a topology in GMSO. This PR will add the atomtyping module to GMSO for passing a GMSO.Forcefield and a GMSO.Topology and match atomtype using foyer as the backend. Then, the corresponding connection types will be found in the Forcefield and applied to the connections in the topology. * Create parameterize.py, which has the apply function which can take a topology, and a gmso forcefield to apply to it. This can use subgraph isomorphism to identify molecular structures in the topology through the bondgraph and bin those into unique molecules that match a specified forcefield. This apply function can also do the standard atomtyping of the entire topology in one step. * Create isomorph.py which uses networkx graphs to identify disconnected components and isomorphism to identify repeated structures. * Move module imports for apply into atomtyping to prevent circular imports * Add a quick fix which will do atomtyping if no residue flag is passed * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * correctly update sites when adding subtop * Changes to doc strings for clarity. Add a subtop_label variable to generalize where the molecule definition is pulled from * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * More modular architecture * WIP- Add testing, minor refactoring of the APIs. * WIP- Better error handling * Misc Changes 1. Update Topology after parametrization 2. Add dependency for forcefield_utilities in env files 3. Add tests for trappe forcefield 4. Patch parmed (should be moved to a cal's PR #658 * WIP- Skip opls for now, full TrappE tests * Avoid accidental overwriting of typemap when using isomorphism * WIP- Remove unused import * Use enumerate for atom index while converting to TopologyGraph * Fix argument order * WIP- Add test for subtopology parameterization * Make opls/trappe global fixtures, Add tests for isomorphism * Further testing isomorphism * REVERT - skip OPLS tests * Copy scaling factors and combining rules after parametrization * Proper OPLS tests * WIP- Refactor the test module * WIP- Remove unused import * WIP- Add test for parameterization with impropers * WIP- Additional impropers test; Separate module for testing impropers * Minor refacotors; additional edge cases coverage/tests * Docstring minor fix * Remove rel_to_module as is obsolete in forcefield_utilities * Change trappe_ua to trappe-ua for correct loading * fix typo, add note about specific use case * pip install forcefield-utilites until new release Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Co Quach <daico007@gmail.com> Co-authored-by: Co Quach <43968221+daico007@users.noreply.github.com> Co-authored-by: Umesh Timalsina <umesh.timalsina@vanderbilt.edu> * Properly apply different scaling factors while parameterizing Co-authored-by: CalCraven <54594941+CalCraven@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Co Quach <daico007@gmail.com> Co-authored-by: Co Quach <43968221+daico007@users.noreply.github.com>
This PR will add the atomtyping module to GMSO for passing a GMSO.Forcefield
and a GMSO.Topology and match atomtype using foyer as the backend. Then,
the corresponding connection types will be found in the Forcefield and
applied to the connections in the topology.
topology, and a gmso forcefield to apply to it. This can use subgraph
isomorphism to identify molecular structures in the topology through
the bondgraph and bin those into unique molecules that match a specified
forcefield. This apply function can also do the standard atomtyping of
the entire topology in one step.
components and isomorphism to identify repeated structures.
Checklist:- [x] Use site residues to type repeated structures- [ ] Write a function that will add residue information to specified sites- [ ] Evaluate docstring clarity- [ ] Write unit tests- [x] Match pre-commit styleUpdated Checklist: