-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plugin system for grammar styles #377
Comments
I've been working on a something that overlaps with this issue for the past few days. Because of some future goals I have with FF in foyer, I wanted to be able to use a broader set of SMARTS features than is currently implemented. I played with generalizing the GRAMMAR, but didn't think myself up to making those changes and all the changes elsewhere in the code that would be needed. Instead, focusing on atom types for chemical elements (not non-element '_' atoms), I just use a boolean switch in the call to FF.apply to select some new functionality and pass slightly modified SMARTS strings from the forcefield directly to rdkit to use its SMARTS substructure matching. The experimental code I have so far seems to work and can type all of the test OPLSAA molecules. The benefit is immediate access to almost the entire SMARTS grammar. For example, can become: This works fine to type benzene. Definitions based on other atom types, for example in the aromatic H on carbon: The current defs in oplsaa.xml continue to work when passed through the new system. As evidenced in the above aromatic H defintion ( [H] --> [#1]), I had to do some ad hoc modifications to get the non-SMARTS-standard (or at least non-rdkit-SMARTS-standard) use of explicit H's to play nice with the rdkit implementation. The current code is experimental, non-optimized, kludgy, without proper Exceptions or much validation, etc., but it seems to work just fine. Except for handling the boolean to turn this feature on/off, no changes to the code were made except in atomtyper.py and that is mostly additions. Of course, one must turn off FF validation if new defintions are to be used. The SMARTSGraph is replaced with a simple object that just holds the smarts_string, typemap, etc. and has a simple find_matches method that builds an rdkit molecule and calls rdkit substructure matching. I have an idea about how this might possibly be extended to non-element atoms, but that is unimplemented or tested. It seems that something like this approach could be a valuable expansion of foyer capabilities without the work involved in an expanded GRAMMAR. Let me know if this is of interest to the developers. |
Describe the behavior you would like added to Foyer
After discussions with @umesh-timalsina and @daico007 , It occurred to me that we might be able to include various grammars as plug-ins for foyer to use when graph matching.
Describe the solution you'd like
Assuming the various chemical perception grammars could be represented as some type of compatible grammar for the
lark-parser
, this could enable different types of graph matching like SMIRKS, SMARTS, etc.Describe alternatives you've considered
N/A
Additional context
This would be a tremendous amount of work, and is not yet ready for actual implementation until much later after #358 . This also will require expert-level domain knowledge of grammar parsing and development.
The text was updated successfully, but these errors were encountered: