Data and scripts to generate figures for the perspective "Data sharing in chemistry: lessons learned and a case for mandating structured reaction data" by R Mercado, SM Kearnes, and CW Coley.
You can use a conda environment to run the plotting scripts in this repo. To set up the environment, run:
conda create -n data-sharing-perspective seaborn -c anaconda
conda activate data-sharing-perspective`
To create the plot for figures 1 and 2 in the manuscript, run:
python plot-entries.py
python plot-contributors.py
The first script will plot data entries over time and the second script will plot contributors/sources over time, for the following databases:
Files will be created in plots/.
Files used for making the figures shown in the paper are available in illustrator/. Made using Adobe Illustrator.
The raw data for the above plots is available in data/. For individual sources, see below:
Structures available in the CSD (cumulative): Data collected from:
Depositors for data in the PDB (cumulative): Entries available in the PDB (cumulative) Data collected from:
Sources for data in PubChem (cumulative): Data entries in PubChem (cumulative): Data collected from:
- PubChem publications page where I got the data count for PubChem Compounds, BioAssays, and Substances:
* Accessed Dec 25, 2022.
Sources (documents) for data in ChEMBL (cumulative): Compound entries in ChEMBL (cumulative): Data collected from:
- Breakdown of all data sources in the release notes
Flaticon images linked here (used freely with attribution): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11