Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomtype parameterization methods for parameterizing a topology in GMSO. #644

Merged
merged 46 commits into from
Jun 15, 2022

Conversation

CalCraven
Copy link
Contributor

@CalCraven CalCraven commented Apr 14, 2022

This PR will add the atomtyping module to GMSO for passing a GMSO.Forcefield
and a GMSO.Topology and match atomtype using foyer as the backend. Then,
the corresponding connection types will be found in the Forcefield and
applied to the connections in the topology.

  • Create parameterize.py, which has the apply function which can take a
    topology, and a gmso forcefield to apply to it. This can use subgraph
    isomorphism to identify molecular structures in the topology through
    the bondgraph and bin those into unique molecules that match a specified
    forcefield. This apply function can also do the standard atomtyping of
    the entire topology in one step.
  • Create isomorph.py which uses networkx graphs to identify disconnected
    components and isomorphism to identify repeated structures.

Checklist:
- [x] Use site residues to type repeated structures
- [ ] Write a function that will add residue information to specified sites
- [ ] Evaluate docstring clarity
- [ ] Write unit tests
- [x] Match pre-commit style

Updated Checklist:

  • Feature Parity with Parmed
  • Multi--Forcefield application
  • Isomorphic Test

This PR will add the atomtyping module to GMSO for passing a GMSO.Forcefield
and a GMSO.Topology and match atomtype using foyer as the backend. Then,
the corresponding connection types will be found in the Forcefield and
applied to the connections in the topology.

* Create parameterize.py, which has the apply function which can take a
topology, and a gmso forcefield to apply to it. This can use subgraph
isomorphism to identify molecular structures in the topology through
the bondgraph and bin those into unique molecules that match a specified
forcefield. This apply function can also do the standard atomtyping of
the entire topology in one step.
* Create isomorph.py which uses networkx graphs to identify disconnected
components and isomorphism to identify repeated structures.
@lgtm-com
Copy link
Contributor

lgtm-com bot commented Apr 14, 2022

This pull request introduces 2 alerts when merging 8e69d6e into efeaad2 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Illegal raise

@lgtm-com
Copy link
Contributor

lgtm-com bot commented Apr 14, 2022

This pull request introduces 2 alerts when merging 9074909 into efeaad2 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Illegal raise

Copy link
Member

@umesh-timalsina umesh-timalsina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is shaping up nicely @CalCraven. I have added some comments by skimming through it.

gmso/atomtyping/isomorph.py Outdated Show resolved Hide resolved
Comment on lines 47 to 48
matcher = nx.algorithms.isomorphism.GraphMatcher(
graph, graph_of_interest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add a node_match function (based on the element of the node) IIRC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly would a node_match function do? I'm not entirely familiar, is it something passed to the GraphMatcher to do the node matching?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just tested this offline, I think the current use of GraphMatcher only concern about the shape of the graph but not about the nodes involved. So this could cause issue with molecules with same shape but different element would still be recognized as isomorphic (so lumped into the same group).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ethanol = mb.load("CCO", smiles=True)
ethanol.name = "Ethanol"

eth1 = mb.clone(ethanol)
eth2 = mb.clone(ethanol)

eth1_top = mb_convert(eth1)
eth1_top.sites[0].name = "K"
eth1_top.sites[0].element = element_by_symbol("K")
eth2_top = mb_convert(eth2)

Note below I am using the get_modified_topology_graph from the example notebook.

eth1_grp = get_modified_topology_graph(eth1_top)
eth2_grp = get_modified_topology_graph(eth2_top)
matcher = nx.isomorphism.GraphMatcher(eth1_grp, eth2_grp)

Then
matcher.is_isomorphic() will still return True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a different note, I think we can use this GraphMatcher for the issue with the assumed site order in a molecule (or subtopology) (also this mosdef-hub/foyer#504). Will have to sacrifice some performance for checking but the number of checks performed will still be fewer that going the completely splitting the topology, so should still be a bit faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay this makes more sense to me now, thanks @daico007. I was a little hazy on the exact implementation of GraphMatcher. I think having a few different options for matching makes sense, which trade of increasing performance for increasing matching precision. This reminds me a bit of the filters used in PR #649, which can be used to specify how you want to filter unique parameter types. I think it makes sense to at least try a basic one that checks at least for elements in the correct positions, and allow that to be a built in option.

gmso/atomtyping/isomorph.py Outdated Show resolved Hide resolved
gmso/atomtyping/parameterize.py Outdated Show resolved Hide resolved
gmso/atomtyping/parameterize.py Outdated Show resolved Hide resolved
gmso/atomtyping/parameterize.py Outdated Show resolved Hide resolved
for connect in getattr(top, "_" + parameter + "s"):
sub_top_label = connect.connection_members[0].label_
ff_of_interest = forcefields[sub_top_label]
type_getter = attrgetter("atom_type.atomclass")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a None-Type check prudent here? I think foyer will raise an error if it cannot find an atomtype right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to use the assert_bond_params flag to see if we should or should not look for all instances of that connection to be typed? This will definitely throw an error if it's missing as of now. Line 97 essentially handles this check as of now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an error message in e1f3d4e.

gmso/atomtyping/parameterize.py Outdated Show resolved Hide resolved
index=gmso_topology.get_index(atom),
atomic_number=None,
element=atom.name,
subtopology_label=atom.label,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where we want to address this but we need a better identifier than atom.label.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I will think about this, and see if there is something better we can be using.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not the atom be the atom number in the topology?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index is the atom_number in the topology. The subtopology_label should be the string that we use to apply a given forcefield to the sites with this label.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to PR #638, this info will be stored in site.molecule. I think whatever convention we decide to use there can be the reference label we use to identify the molecule name, and therefore the identifier we use for the forcefield

gmso/atomtyping/parameterize.py Outdated Show resolved Hide resolved
@lgtm-com
Copy link
Contributor

lgtm-com bot commented Apr 25, 2022

This pull request introduces 2 alerts when merging 6376da7 into f9e7332 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Illegal raise

@codecov
Copy link

codecov bot commented Apr 26, 2022

Codecov Report

Merging #644 (318c5ad) into develop (4436e35) will increase coverage by 0.59%.
The diff coverage is 99.53%.

@@             Coverage Diff             @@
##           develop     #644      +/-   ##
===========================================
+ Coverage    91.10%   91.70%   +0.59%     
===========================================
  Files           57       64       +7     
  Lines         4758     4964     +206     
===========================================
+ Hits          4335     4552     +217     
+ Misses         423      412      -11     
Impacted Files Coverage Δ
gmso/parameterization/topology_parameterizer.py 99.15% <99.15%> (ø)
gmso/core/topology.py 96.90% <100.00%> (ø)
gmso/external/convert_parmed.py 95.59% <100.00%> (+0.33%) ⬆️
gmso/parameterization/__init__.py 100.00% <100.00%> (ø)
gmso/parameterization/foyer_utils.py 100.00% <100.00%> (ø)
gmso/parameterization/isomorph.py 100.00% <100.00%> (ø)
gmso/parameterization/parameterize.py 100.00% <100.00%> (ø)
gmso/parameterization/subtopology_utils.py 100.00% <100.00%> (ø)
gmso/parameterization/utils.py 100.00% <100.00%> (ø)
gmso/core/angle.py 96.00% <0.00%> (-0.30%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4436e35...318c5ad. Read the comment docs.

@lgtm-com
Copy link
Contributor

lgtm-com bot commented Apr 28, 2022

This pull request introduces 2 alerts when merging 790aacc into cd68411 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Illegal raise

@lgtm-com
Copy link
Contributor

lgtm-com bot commented Apr 28, 2022

This pull request introduces 2 alerts when merging 2d7a770 into 1818e46 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Illegal raise

@lgtm-com
Copy link
Contributor

lgtm-com bot commented Apr 29, 2022

This pull request introduces 2 alerts when merging 40f6639 into f9c40ea - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Illegal raise

@lgtm-com
Copy link
Contributor

lgtm-com bot commented May 2, 2022

This pull request introduces 2 alerts when merging 1503578 into f9c40ea - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Illegal raise

} # TODO validate improper order
molecule_label = "label_" # place to grab info referencing molecule name
for connect in getattr(top, "_" + parameter + "s"):
sub_top_label = connect.connection_members[0].get(molecule_label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sub_top_label = connect.connection_members[0].get(molecule_label)
sub_top_label = connect.connection_members[0].__dict__.get(molecule_label)

or

Suggested change
sub_top_label = connect.connection_members[0].get(molecule_label)
sub_top_label = connect.connection_members[0].molecule_label

molecule_label = "label_" # place to grab info referencing molecule name
for connect in getattr(top, "_" + parameter + "s"):
sub_top_label = connect.connection_members[0].get(molecule_label)
if sub_top_label is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if sub_top_label is None:
if not sub_top_label:

I think this would also catch the case where the sub_top_label is empty string.

@CalCraven
Copy link
Contributor Author

Rename module to to parameterization, not atomtyping

@lgtm-com
Copy link
Contributor

lgtm-com bot commented May 6, 2022

This pull request introduces 1 alert when merging 7be5f57 into 2605c65 - view on LGTM.com

new alerts:

  • 1 for Illegal raise

@lgtm-com
Copy link
Contributor

lgtm-com bot commented May 6, 2022

This pull request introduces 1 alert when merging 7366452 into 2605c65 - view on LGTM.com

new alerts:

  • 1 for Unused import

umesh-timalsina added a commit that referenced this pull request Jun 7, 2022
* Include ImproperType reading in "from_parmed"
This PR is branched from PR #644. The parmed conversion to GMSO
topology was not able to read in various Improper connections to
GMSO. It would aggregate them with the Dihedral connections, from
when GMSO did not have explicit treatment of impropers, similar to
ParmEd. This PR aims to check a ParmEd object for structure.impropers
or structure.dihedrals with the dihedral.impropers flag set to
True and convert those to `gmso.Impropers`. Likewise, it looks to
read in periodic or harmonic impropers and generate the correct
`gmso.ImproperType` parametric potential for that type, if the flag
`refer_types` is set to True in the `from_parmed` call.

* Add pmd_improper_types_map function to convert_parmed
* Generate a parametric potential with harmonic type
expression from pmd.impropers
* Generate a parametric potential with periodic type
expression from pmd.dihedrals and dihedral.improper=True
* Validate testing with foyer opls_validation when using
functions in PR #644

Harmonic type expression for impropers is taken from
https://manual.gromacs.org/current/reference-manual/functions/bonded-interactions.html#improper-dihedrals-harmonic-type
This uses Xi as the independent variable. If someone testing this PR can
validate that this is the formalism we want to use when reading in
ImproperTypes from ParmEd improper_types, that would be a good sanity
check.
The periodic type expression is taken from
https://manual.gromacs.org/current/reference-manual/functions/bonded-interactions.html#equation-eqnperiodicpropdihedral
with phi as the independent variable.

* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci

* Add test for number of impropers from NNdimethylformamide molecule

* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci

* WIP- include gmso in impropertype testing

* WIP- Add support for NoneTypes in site charge/mass

* Address Review comments and create impropertype expression from HarmonicImproperPotential in library

* WIP- Add testing for impropers

* Modification for improper potential forms from the PotentialTemplateLibrary

* WIP- Use id instead of object for storing mapping; impropertypes test

* WIP- combined test for dihedral/impropertypes

* Fix test case and id mapping in conversion

* Add tests for harmonic impropers

* Add proper properties for improper_types

* WIP-Additional test cases

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Umesh Timalsina <umesh.timalsina@vanderbilt.edu>
@daico007
Copy link
Member

One edge case popped up when @bc118 was testing on this PR: when a topology with no subtopology (so only two layers topology and sites) is being typed, the apply can only take one single forcefield XML file, and not in the form of a dictionary ({molecule_name: forcefield_file}). Handling this issue include erroring out and asking user for the appropriate input or consider the topology as a subtop (so matching top.name to the key in the XML dictionary provided).

During that discussion, another question raised, do all forcefield provided in the forcefield XML dictionary has to be used or do we allow users to supercharge and provide excess forcefields. One idea is to not allow it by default but only when expert_mode=True.

@daico007 daico007 merged commit 7fe2dc5 into mosdef-hub:develop Jun 15, 2022
daico007 added a commit that referenced this pull request Jun 28, 2022
* Modify scaling factors to a numpy array

* Misc Changes

1. Split property into two parts. global_scaling_factors and per_molecule_scaling_factors
2. Handle Nan/NoneType cases, non float cases
3. Add additional functions for handling scaling factors
4. Change formats/json.py to incorporate new properties
5. Change function name to prefix `get_`, formats/mcf.py change

* WIP- Refactor global_scaling_factors to scaling_factors

* Additional tests; Simplificaiton of get_scaling_factors method

* Atomtype parameterization methods for parameterizing a topology in GMSO. (#644)

* Atomtype parameterization methods for parameterizing a topology in GMSO.

This PR will add the atomtyping module to GMSO for passing a GMSO.Forcefield
and a GMSO.Topology and match atomtype using foyer as the backend. Then,
the corresponding connection types will be found in the Forcefield and
applied to the connections in the topology.

* Create parameterize.py, which has the apply function which can take a
topology, and a gmso forcefield to apply to it. This can use subgraph
isomorphism to identify molecular structures in the topology through
the bondgraph and bin those into unique molecules that match a specified
forcefield. This apply function can also do the standard atomtyping of
the entire topology in one step.
* Create isomorph.py which uses networkx graphs to identify disconnected
components and isomorphism to identify repeated structures.

* Move module imports for apply into atomtyping to prevent circular imports

* Add a quick fix which will do atomtyping if no residue flag is passed

* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci

* correctly update sites when adding subtop

* Changes to doc strings for clarity. Add a subtop_label variable to generalize where the molecule definition is pulled from

* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci

* More modular architecture

* WIP- Add testing, minor refactoring of the APIs.

* WIP- Better error handling

* Misc Changes

1. Update Topology after parametrization
2. Add dependency for forcefield_utilities in env files
3. Add tests for trappe forcefield
4. Patch parmed (should be moved to a cal's PR #658

* WIP- Skip opls for now, full TrappE tests

* Avoid accidental overwriting of typemap when using isomorphism

* WIP- Remove unused import

* Use enumerate for atom index while converting to TopologyGraph

* Fix argument order

* WIP- Add test for subtopology parameterization

* Make opls/trappe global fixtures, Add tests for isomorphism

* Further testing isomorphism

* REVERT - skip OPLS tests

* Copy scaling factors and combining rules after parametrization

* Proper OPLS tests

* WIP- Refactor the test module

* WIP- Remove unused import

* WIP- Add test for parameterization with impropers

* WIP- Additional impropers test; Separate module for testing impropers

* Minor refacotors; additional edge cases coverage/tests

* Docstring minor fix

* Remove rel_to_module as is obsolete in forcefield_utilities

* Change trappe_ua to trappe-ua for correct loading

* fix typo, add note about specific use case

* pip install forcefield-utilites until new release

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Co Quach <daico007@gmail.com>
Co-authored-by: Co Quach <43968221+daico007@users.noreply.github.com>
Co-authored-by: Umesh Timalsina <umesh.timalsina@vanderbilt.edu>

* Properly apply different scaling factors while parameterizing

Co-authored-by: CalCraven <54594941+CalCraven@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Co Quach <daico007@gmail.com>
Co-authored-by: Co Quach <43968221+daico007@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants