Skip to content

Latest commit

 

History

History
246 lines (193 loc) · 14.3 KB

IUPAC_SMILES+_appendix1.asciidoc

File metadata and controls

246 lines (193 loc) · 14.3 KB

IUPAC SMILES+ Specification: Appendix 1 - Proposed and Known SMILES Extensions [working draft]

IUPAC SMILES+ Contributors: Vincent F. Scalfani (Chair), Evan Bolton, Helen Cooke, Chris Grulke, John Irwin, Oliver Koepler, Gregory Landrum, José L. Medina-Franco, Miguel Quirós Olozábal, Susan Richardson, and Issaku Yamada.

v0.1,2019-04-15: Working Draft
IUPAC SMILES+ Project No. 2019-002-024
Copyright © 2020, IUPAC
Content is available under GNU Free Documentation License 1.2

This IUPAC SMILES+ Specification Appendix [working draft] document is a modified derivative of the OpenSMILES Specification. We have endeavored to maintain all prior author names, contributor names, copyright notices, and revision history.

OpenSMILES Specification
Craig A. James
v1.0,2016-05-15: Current specification

Copyright © 2007-2016, Craig A. James
Content is available under GNU Free Documentation License 1.2

OpenSMILES Contributors: Richard Apodaca, Noel O’Boyle, Andrew Dalke, John van Drie, Peter Ertl, Geoff Hutchison, Craig A. James, Greg Landrum, Chris Morley, Egon Willighagen, Hans De Winter, Tim Vandermeersch, John May

Known Extensions

coming soon…​

Proposed Extensions

External R-Groups

Daylight proposed, and OpenEye actually implemented, an extension that specifies bonds to external R-groups. An external R-group is specified using ampersand '&' followed by a ring-closure specification (either a digit, or '%' and two digits). However, unlike ring-closures, the bond is to an external, unspecified R-group. Example: n1c(&1)c(&2)cccc1 - 2,3-substituted pyridine.

Polymers and Crystals

Daylight (Weininger) proposed, but never implemented, an extension for crystals and polymers. Daylight also used the ampersand '&' character, (which may conflict with the R-group proposal, above), but with the added rule that if a number appears more than once, it creates a repeating unit.

SMILES Name

c1ccccc1C&1&1

polystyrene

C&1&1&1&1

diamond

c&1&1&1

graphite

Atom-based Double Bond Configuration

The directional '/' and '\' marks for cis/trans bonds seem simple on the surface but are problematic for complex systems. The issue is that in conjugated systems one directional bond can be used in defining the configuration of two double bonds. When assigning the directional bonds the existing labels must be considered or rewritten. In a long series of conjugated double bonds, changing the configuration of one bond can require rewriting dozens of bond symbols.

More importantly, there is a theoretical flaw with the use of '/' and '\'. It is possible to write valid SMILES for the molecule cyclooctatetraene by alternating directional assignments for the cis configurations. However, as shown below attempting to change one configuration is not possible. Reassigning the directional labels for adjacent double bonds will not work as it reassignment propagates around the ring and the conflict is not resolved.

Including directional labels to explicit hydrogen atoms is a possible resolution but does not follow standard-form and complicates the assignment procedure.

Depiction SMILES Comment

cyclooctatetraene

C/1=C/C=C\C=C/C=C\1

cyclooctatetraene

Todo

C/1=C/C=C/C=C/C=C\1

one bond changes two configurations

The proposed syntax for double bond configurations uses the '@' and '@@' atom-based specification. For example:

Depiction SMILES Name

trans difluoroethene

F[C@@H]=[C@H]F

trans-difluoroethene

F[C@H]=[C@@H]F

cis difluoroethene

F[C@H]=[C@H]F

cis-difluoroethene

F[C@@H]=[C@@H]F

Interpretation of '@' and '@@' follows the tetrahedral convention: The atoms, as encountered in the SMILES string, are either in anticlockwise '@' or clockwise '@@' order as viewed on the page. Since cis/trans configurations are planar, they can also be "viewed from underneath the page", which results in the two valid SMILES shown for each compound, above.

As with the other atom-bases specifications one must consider the relative position of implicit atoms. It is not always true that a trans form has opposite "clock-ness" ('@‘,’@@' or '@@‘,’@‘), and the cis form has the same "clock-ness" (’@‘,’@' or '@@‘,’@@').

Depiction SMILES Name

trans difluoroethene

F[C@@H]=[C@H]F

trans-difluoroethene

[C@H](F)=[C@H]F

cis difluoroethene

F[C@H]=[C@H]F

cis-difluoroethene

[C@@H](F)=[C@H]F

Atom-based '@' and '@@' for the stereo-specification of double bonds does not suffer from the theoretical flaw illustrated with cyclooctatetraene. The assignments are not-shared and adjacent configurations do not need to be considered. This is more flexible and and simplifies generation of canonical SMILES.

Depiction SMILES Name

cyclooctatetraene

[C@H]1=[C@@H][C@@H]=[C@@H][C@@H]=[C@@H][C@@H]=[C@@H]1

cyclooctatetraene

Note that the first stereo-specification carbon must be represented as '@' since the '1' follows the H, whereas the rest of the carbons use '@@' to characterize the cis configuration of each bond. Since this is a specification on the atom, rather than the single bond, no conflict arises at the ring-closure bond.

Radical

This section needs considerable work. The following text is courtesy Chris Morley, who commented: "I guess the last paragraph doesn’t look too good in a formal specification. There are two reasons for the frailty: lack of proof that the radical and aromatic uses can always be unambigous (I doubt anybody has tried); and a known deficiency in the parser." However, it is a good starting point…​

A single lowercase symbol is interpreted as a radical center. CCc is an alternative to CC[CH2] and is the 1-propyl radical; CcC or C[CH]C is the 2-propyl radical, Co is the methoxy radical. An odd number of adjacent lowercase symbols is a delocalized conjugated radical. So Cccccc is CC=CC=C[CH2] or CC=C[CH]C=C or C[CH]C=CC=C Lowercase 'c' or 'n' can be used in a ring: C1cCCCC1 is the cyclohexyl radical.

The use of the non-aromatic lowercase symbol is a shorted form with improved intelligibility that allows the use of implicit hydrogen in radicals. However it is intended only for simple unambiguous molecules and is not reliable when combined with aromatic atoms.

Twisted SMILES

An interesting extension that specifies conformational information via bond dihedral angles and bond lengths was proposed by McLeod and Peters:

Extended Hueckel’s Rule

THIS SECTION IS UNDER MAJOR REVISION, AND AT THIS POINT IS ONLY FOR DISCUSSION PURPOSES.

This proposed section is an attempt to simplify the rule-based system by enumerating all atom/bond configurations that are known to participate in aromatic systems.

A single, isolated ring that meets the following criteria is aromatic:

Each element that can participate in an aromatic ring is defined to have the following number of π electrons:

Configuration π Electrons Example Comment

BX3v3n

0

BX3v3n ex1

OpenSMILES extension

BX2v3n

1

BX2v3n ex1

OpenSMILES extension

CX3v3m

2

CX3v3m ex1

CX3v4o

0

CX3v4o ex1

CX3v3p

0

CX3v3p ex1

CX2v3m

1

CX2v3m ex1

CX3v4

1

CX3v4 ex1

CX2v3p

1

CX2v3p ex1

NX2v2

2

NX2v2 ex1

NX3v3

2

NX3v3 ex1

NX2v3

1

NX2v3 ex1

NX3v4

1

NX3v4 ex1

NX3v5

1

NX3v5 ex1

Non-oxide contributes 2 in Daylight toolkit

PX2v2

2

PX2v2 ex1

PX3v3

2

PX3v3 ex1

PX2v3

1

PX2v3 ex1

PX3v4

1

PX3v4 ex1

PX3v5

1

PX3v5 ex1

Non-oxide contributes 2 in Daylight toolkit

AsX3v3

2

AsX3v3 ex1

AsX2v3

1

AsX2v3 ex1

OpenSMILES extension

AsX3v4

1

AsX3v4 ex1

OpenSMILES extension

OX2v2

2

OX2v2 ex1

OX2v3

1

OX2v3 ex1

SX2v2

2

SX2v2 ex1

SX2v3

1

SX2v3 ex1

SX3v4

2

SX3v4 ex1

Possibly chiral

SX3v3p

2

SX3v3p ex1

Possibly chiral, OpenSMILES extension

SeX2v2

2

SeX2v2 ex1

SeX2v3

1

SeX2v3 ex1

SeX3v4

2

SeX3v4 ex1

Possibly chiral

SeX3v3p

2

SeX3v3p ex1

Possibly chiral, OpenSMILES extension

Revision History

IUPAC SMILES+ Specification: Appendix 1 - Proposed and Known SMILES Extensions

Revision Date Description Name

1.0

2020-09-24

Transfer proposed extensions to this appendix

Vincent F. Scalfani

1.1

2023-10-30

Transfer Extended Hueckel’s Rule to this appendix

Vincent F. Scalfani