Chemical Substances from The University of Alabama Dissertations and Theses
Important
This repository contains machine-readable registered (converted to machine format and checked for uniqueness) chemical substances (non-standardized) from The University of Alabama Dissertations and Theses (hereafter, theses). The chemical data in this repository are working temporary files, and non-standardized chemical structure and related data that we created during the project. As such, it is not recommended to use any of this repository data. We recommend downloading the University of Alabama Libraries chemical structure data from PubChem as PubChem handles the standardization (within the Compound database) of the chemical structures:The University of Alabama Libraries PubChem Data Source. This repository is useful to understand our workflow, and read our notes on copyright and reuse.
There are currently ~3000 chemical substances registered across 73 theses
Chemical structure data includes the name (or ID), SMILES, and InChI of synthesized chemical substances within the thesis along with a permalink to the thesis full-text or catalog link (if not yet available online), Moreover, an SDfile containing the connection table, name (or ID), SMILES, InChI, citation, permalink, and local structure registry ID is included.
Much of our inspiration for this project came from the following similar project:
Andrews, D. M.; Broad, L. M.; Edwards, P. J.; Fox, D. N. A.; Gallagher, T.; Garland, S. L.; Kidd, R.; Sweeney, J. B. The Creation and Characterisation of a National Compound Collection: The Royal Society of Chemistry Pilot. Chem. Sci. 2016, 7 (6), 3869–3878. DOI:10.1039/C6SC00264A
Vincent F. Scalfani (Chemical Registration), Barbara Dahlbach (Digitization of full text Theses), and Jacob Robertson (Institutional Repository Records).
VFS thanks The University of Alabama and The University of Alabama Libraries for approving research sabbatical leave for this project. We are grateful to ChemAxon for providing the MarvinSketch academic license and Bio-Rad for providing the KnowItAll academic license.
Disclaimer: Not legal advice, just our own personal (non-lawyer) thoughtful notes.
The purpose of The University of Alabama Dissertation and Thesis Substance Registration project is to allow greater discovery, use, and credit of the original authors' theses, not to claim any ownership of the written thesis content. The thesis authors hold the copyright to their own thesis.
For all substances and associated data extracted and registered: no judgment is made on the appropriateness of the synthetic method reported, safety precautions required, nor accuracy of the characterization data. Readers need to make their own assessment of the authors claims, procedures, and necessary safety precautions.
During extraction, registration and processing of the chemical substances and related data, inaccuracies may be present due to human and/or machine software error. We attempted to minimize inaccuracies and share chemical substances and related data with fidelity to the original thesis. Moreoever, we can not make any guarantees on the accuracy of the chemical structure data, chemical names, and other associated data from the theses. You should always check the original thesis reference to verify the data, as well as check other relevant information sources.
We have only extracted and shared scientific facts (i.e., the chemical substances) and bibliographic information from the theses. Such scientific facts and bibliographic data are not subject to U.S. copyright protection: Compendium of U.S. Copyright Office Practices. See specifically section 313.3(A), where examples are listed that are excluded from copyright protection, one of which includes chemical substances:
..."DNA sequences and other genetic, biological, or chemical substances or compounds, regardless of whether they are man-made or produced by nature..."
We have endeavored to credit each thesis author respectfully by including a citation reference and permalink (where possible) on all shared chemical structure data including within this Git Repository data files, The University of Alabama Institutional Repository, and PubChem Substance Pages.
All chemical structure and associated data in this repository is licensed with CC-BY 4.0. Please give the original authors of the theses credit by citing their work and following standard scholarly practice for reuse of the scientific literature, particularly if the data has led you to useful content within their thesis (as noted above, each thesis Author holds their own copyright). If you are reusing a large corpus of the structures as a dataset, then it is appreciated if you cite our work as we put the effort into the data compilation. The data within the data_analysis section was collected from Pubchem and is credited to NCBI: https://www.ncbi.nlm.nih.gov/home/about/policies/
Code in this repository is licensed under the BSD-2-Clause license. Some portions are designed to work with proprietary software, such as MathWorks MATLAB and ChemAxon Marvin, which is not included under this license. Users must have valid licenses for any required proprietary software to run these portions of the code.