Basic cheminformatics in Bioclipse is mainly handled by the Chemistry
Development Kit (CDK, [1,2,3])
and for this there is the cdk
manager.
The cdk manager is one with many features. One is to validate CAS registry numbers, identifiers used by the Chemical Abstract Services:
cdk.isValidCAS("50-00-0")
But let's go to the more interesting functionality around chemical graphs. For example, let's see how we can create molecular structures from a SMILES string:
Script code/FromSMILES.groovy
mol = cdk.fromSMILES("COC")
Normally, structure diagrams are generated without explicit hydrogens. But we can easily add them:
cdk.addExplicitHydrogens(mol)
We can then calculate a number of properties, including the molecular mass\index{molecular mass}, total formal charge, and molecular formula:
cdk.calculateMass(mol)
cdk.totalFormalCharge(mol)
cdk.molecularFormula(mol)
Additionally, we can also inspect some of in the information present in the model:
cdk.has2d(mol)
cdk.has3d(mol)
cdk.isConnected(mol)
The cdk manager is also central to file support. Before we load it, we may want to just check the file format:
cdk.determineFormat(
"/ACS Drug Disclosures/AZD5423.cml"
)
However, this information is not needed when loading files:
mol = cdk.loadMolecule(
"/ACS Drug Disclosures/AZD5423.cml"
)
Saving is quite similar, and there are two methods for the two main formats:
cdk.saveCML(mol, "/Test/mol.cml")
cdk.saveMDLMolfile(mol, "/Test/mol.mol")
The cdx
manager is also based on the CDK and exposes
functionality more oriented at CDK developers. For example, we can
create a String representation of the full data model for debugging
purposes:
cdx.debug(mol)
Or we can see the details of the differences between two data models:
cdx.diff(
cdk.fromSMILES("CC"),
cdk.fromSMILES("CCC")
)
And we can list the exact atom types for the atoms in a molecule:
Script code/PerceiveCDKAtomTypes.groovy
cdx.perceiveCDKAtomTypes(mol)
Which lists for ethanol:
1:C.sp3
2:C.sp3
3:O.sp3
The inchi
manager makes functionality from the InChI
standard available [4,5].
The InChI library is not available as a Java library, but is included as a
binary for a selection of platforms and operating systems. This means that we
cannot assume the InChI functionality is always available in Bioclipse.
Furthermore, we need to load the library:
Script code/LoadInChI.groovy
inchi.load()
inchi.isLoaded()
But when that has succeeded, we can start minting InChIs:
Script code/InChIGenerate.groovy
anInChI = inchi.generate(
opsin.parseIUPACName("methane")
)
Which returns:
InChI=1S/CH4/h1H4
The returned value is a class called InChI and we can get both the full InChI as well as the InChIKey from it:
Script code/InChIKeyGenerate.groovy
fullInChI = anInChI.getValue()
InChIKey = anInChI.getKey()
The opsin
manager makes functionality from the OPSIN
available [6]: convert IUPAC names to chemical
structures.
Script code/ParseIUPACName.py
mol = opsin.parseIUPACName(
"Ethyl [(1R,3aR,4aR,6R,8aR,9S,9aS)-9-" +
"{(E)-2-[5-(3-fluorophenyl)-2-pyridinyl]vinyl}-" +
"1-methyl-3-oxododecahydronaphtho[2,3-c]furan-" +
"6-yl]carbamate"
)
- Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Luttmann E, et al. The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics. JCICS. 2003 Feb 11;43(2):493–500. doi:10.1021/CI025584Y (Scholia)
- Steinbeck C, Hoppe C, Hoppe C, Kuhn S, Floris M, Guha R, et al. Recent Developments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics. Curr Pharm Des [Internet]. 2006 Jun 1;12(17):2111–20. Available from: https://cdk.github.io/cdk-paper-2/ doi:10.2174/138161206777585274 (Scholia)
- Willighagen E, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, et al. The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform. 2017 Jun 6;9(1). doi:10.1186/S13321-017-0220-4 (Scholia)
- Heller SR, McNaught A, Stein S, Tchekhovskoi D, Pletnev I. InChI - the worldwide chemical structure identifier standard. J Cheminform. 2013;5(1):7. doi:10.1186/1758-2946-5-7 (Scholia)
- Spjuth O, Berg A, Adams SE, Willighagen E. Applications of the InChI in cheminformatics with the CDK and Bioclipse. J Cheminform. 2013;5(1):14. doi:10.1186/1758-2946-5-14 (Scholia)
- Lowe DM, Lowe DM, Corbett PT, Murray-Rust P, Glen RC. Chemical Name to Structure: OPSIN, an Open Source Solution. JCIM. 2011 Mar 28;51(3):739–53. doi:10.1021/CI100384D (Scholia)