Skip to content

Commit

Permalink
update to the Joy of FAIR text
Browse files Browse the repository at this point in the history
  • Loading branch information
stuchalk committed Mar 15, 2024
1 parent 38bd3d1 commit 2f01029
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 3 deletions.
28 changes: 28 additions & 0 deletions book/cooking.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,31 @@ Working with data in terms of FAIR and in a digital environment means working wi
different activities, different steps to handle the data. This section will provide a brief background on
machine-readable data and the FAIR data principles in the context of chemistry, what you can do with machine-readable
chemical data and the importance of preparing data to be FAIR and discoverable in domain repositories.

FAIR-enabled data are structured and described to facilitate automated processing from machine to machine and system to
system, and thus can be utilized for AI/ML and other digital applications. “Fully AI-Ready” data structures are
systematically organized and consistently formatted so that algorithms can parse and operate on them. Processable data
structures that adhere to discipline specific data standards and align with the FAIR data principles ensure quality and
accuracy of reuse. FAIR data are accessible through programmatic interfaces and can be tapped directly within user code
for automated exchange and analysis.

These concepts as framed by the FAIR principles manifest as concrete technical attributes for programmers and data
scientists in practice. They can be abstractions for those not familiar with navigating from an informatics/data
structure perspective. To make sense of the applicability of the FAIR data principles in the context of
machine-readability, it is important to appreciate how use cases for using chemical data can match with technical
attributes that enable data to be reused through automated means.

There are a number of scenarios where it can be useful to navigate across distributed data resources using programmatic
methods - for example, a global search for specific chemicals, cross-exchange of chemical information between data
repositories, validation of converted or predicted chemical representations, or integration of distributed data for
compiled meta-analysis [WF3.3]. A machine processing workflow for reusing data might proceed through stages of
discovery, retrieval, validation, curation, compilation, visualization, analysis and derivation. Each of these is
dependent on some combination of consistently structured data, granular metadata description and reproducible protocols.
Aligning these with FAIR community practices optimizes the workflow for repeated and reliable automated and scalable
data reuse.

The FAIR Data Principles are framed from the perspective of data reuse. Process engineers, researchers, and an
increasing number of automated processes need complete and unambiguous description of research results in convenient
forms that are easy to find, retrieve and compile. To get more FAIR data out in consumable forms, we also need to
consider the other side of the equation – critical parameters for documenting data during the lifecycle upstream of
sharing to ensure that meaning and quality can be assessed and reassessed appropriately.
13 changes: 10 additions & 3 deletions book/references.bib
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
---
@article{Wilkinson2016,
@article{Wil16,
title = {The FAIR Guiding Principles for scientific data management and stewardship},
volume = {3},
issue = {1},
Expand All @@ -18,7 +18,7 @@ @article{Hanson2022
title = {IUPAC specification for the FAIR management of spectroscopic data in chemistry (IUPAC FAIRSpec) – guiding principles},
author = {Robert M. Hanson and Damien Jeannerat and Mark Archibald and Ian J. Bruno and Stuart J. Chalk and
Antony N. Davies and Robert J. Lancashire and Jeffrey Lang and Henry S. Rzepa},
pages = {623--636},
pages = {623-636},
volume = {94},
number = {6},
journal = {Pure and Applied Chemistry},
Expand Down Expand Up @@ -59,7 +59,7 @@ @article{Kim2022

@article{Kim2023,
author = {Sunghwan Kim and Jie Chen and Tiejun Cheng and Asta Gindulyte and Jia He and Siqian He and Qingliang Li and Benjamin A. Shoemaker and Paul A. Thiessen and Bo Yu and Leonid Zaslavsky and Jian Zhang and Evan E. Bolton},
title = "{PubChem 2023 update}",
title = {PubChem 2023 update},
journal = {Nucleic Acids Research},
volume = {51},
number = {D1},
Expand All @@ -68,6 +68,13 @@ @article{Kim2023
url = {https://doi.org/10.1093/nar/gkac956}
}

@article{WFD3.3,
author = {Thiessen P., Bolton E., McEwen L. R.},
title = {WorldFAIR (D3.3) Utility services for Chemistry Standards (1.1)},
year = {2023},
url = {https://doi.org/10.5281/zenodo.10514901}
}

@misc{NIST2022,
type = {Repository},
author = {Peter Linstrom},
Expand Down

0 comments on commit 2f01029

Please sign in to comment.