Complete enumerations of amons with up to 7 heavy atoms for GDB17 and ZINC
For how-to on amons generation, go to https://github.com/binghuang2018/aqml
- A wide coverage of organic chemistry, including H, B, C, N, O, F, Si, S, P, Cl, Br, I, Sn
- AGZ7 could be easily extended by inclusion of more accurate properties and/or all conformers, thanks to the small molecular size of AMON (no more than 7 non-hydrogen atoms)
- AGZ7 can be used as a dictionary for the look-up of the most pertinent amons for a query, or the whole as a single training set for QML models (in particular deep neural network) for quantum chemistry problems involving organic molecules
- quantum properties prediction
- geometry optimization
- molecular dynamics
- ...
- Files containing SMILES string (canonicalized by OEChem)
gdb17-zinc-comm.can
: SMILES strings of amons that are shared by gdb17 & zincgdb17.can
: SMILES strings of amons that are unique to gdb17zinc.can
: SMILES strings of amons that are unique to ZINC
- Folders containing 3d geometries of amons
gdb17/
covers the complete set of gdb17 amons (i.e.,gdb17-zinc-comm.can
+gdb17.can
) that survived geometry relaxation.zinc/
covers only a subset of SMILES strings inzinc.can
, that is, amons that are unique to gdb17 and survived geometry relaxation.
- Compressed tar files
- Files in say,
zinc/ni5.tar.gz
indicates the subset of all ZINC amons containing only 5 heavy atoms - Geometry and prperties of each AMON are stored as
key:value
pair in ajson
file (seezinc/README.md
for content details), except for AMONs that didn't survive geometry optimization (i.e., dissociated), for which onlyxyz
file is provided in subfolderdiss/
. - SMILES string (canonicalized by OEChem) & bond order matrix are both available in json file for all but AMONs that underwent graph changes, including dissociated AMONs and undissociated ones (in subfolder
newg/
or compressed filenewg.tar.gz
). - The name of a json file is of format
[[ID]].json
, withID
being a unique identity assigned to the associated molecular graph
- Files in say,
- Each amon is accompanied with only one conformer
- Each conformer corresponds to the coarse "global minima", determined through MMFF94 in RDKit
- Reported final geometry (see sdf files) were calculated at the level of theory DF-B3LYP by Molpro
- default geometry convergence thresholds were used
- basis: cc-pVTZ-PP for Sn/I, cc-pVTZ for all others
- df basis: def2-QZVPP/jkfit for Sn/I, cc-pVQZ/jkfit for all others
@misc{agz7,
title={Dictionary of 140k GDB and ZINC derived AMONs},
author={Bing Huang and O. Anatole von Lilienfeld},
year={2020},
eprint={2008.05260},
archivePrefix={arXiv},
primaryClass={physics.chem-ph}
}