Molecular Graph Reduction algorithm for fragSMILES notation
Introduction
Installation
How to use
Reference
Welcome to the chemicalgof
repository! Graph of Frag (GoF) tool is designed to provide the graph reduction process to convert molecules atom-based to the fragment-based one !
Our tool allows to set a custom-user rule for fragment molecules. By default, our fragmentation rule lead to the so called fragSMILES notation.
To do this, you need python interpreter ... and of course your molecules :)
chemicalgof
package reequires few python dependencies. It has been tested through Python>=3.10. You can find requirements.txt for dependencies packages.
The following instructions can be followed to install chemicalgof
package from our repository:
-
Clone the repository in a comfortable path, for instance
~/
:git clone https://github.com/f48r1/chemicalgof.git
-
(optional but suggested) create your own environment. If you are on a linux OS, python traditional software can be employed. Anaconda is suggested for oher OS like Windows or Mac. For sake of clarity,
gof
is just a customizable name in the following pieces of codes for your virtual environment, therefore you can choose a name by yourself.-
python user create your own virtual environment for python (and then activate it):
python -m venv .gof
source .gof/bin/activate
-
conda user create your own environment for conda (and then activate it):
conda create --name gof python=3.11
conda activate gof
-
-
Navigate to the cloned directory:
cd chemicalgof/
-
Install the package :
python -m setup.py install
Enjoy :)
from chemicalgof import Smiles2GoF
## Example SMILES string of a molecule
smiles = 'C[C@@](O)(Cl)C(=O)NC[C@@H]1CC[C@H](C(=O)O)O1' ## molecule provides chirality information
## Convert SMILES to directed graph !
DiG = Smiles2GoF(smiles)
Now we need a tokenizer traversing graph to tokenize nodes and edges
from chemicalgof import GoF2Tokens
T=GoF2Tokens(DiG)
if you prefer, canonizalied graph can be obtained
from chemicalgof import CanonicalGoF2Tokens
T=CanonicalGoF2Tokens(DiG)
### then get sequence of tokens for fragments and bonds
print(*T.getSequence())
## or simply each fragment and its bonds splitted by dots
print(*T.getString())
GoF tool was first presented on the preprint paper here The resulting fragSMILES notation was used for de novo drug design approaches and compared with traditional notations such as SMILES and SELFIES.
If you think that GoF can be usefull for your project, please cite us :)
@article{Mastrolorito_2024,
title={fragSMILES: a Chemical String Notation for Advanced Fragment and Chirality Representation},
url={http://dx.doi.org/10.26434/chemrxiv-2024-tm7n6},
DOI={10.26434/chemrxiv-2024-tm7n6},
publisher={American Chemical Society (ACS)},
author={Mastrolorito, Fabrizio and Ciriaco, Fulvio and Togo, Maria Vittoria and Gambacorta, Nicola and Trisciuzzi, Daniela and Altomare, Cosimo Damiano and Amoroso, Nicola and Grisoni, Francesca and Nicolotti, Orazio},
year={2024},
month=jul
}