This is the repository for the Data Readiness Cluster to maintain the AI-ready data checklist. The cluster is a community-driven group focusing on developing recommendations and community standards on AI-ready open environmental data. Although the work currently focuses on environmental data, the product could be extended to data from other domains.
- For data producers/providers, the purpose is to understand to what extent the data being assessed meets the common research data management practices and principles that are relevant to AI/ML application development. The assessment result can be used to justify targeted improvements for the dataset when resources become available.
- For projects generating new datasets, the purpose of the AI-readiness checklist can be used to guide the development of the dataset.
For example:
- What documentation do you want to provide accompanying the dataset?
- Do you have a proper data quality assessment that will make the development of downstream AI/ML applications efficient?
The current version of the checklist is available here (last updated 2023-12-20). The checklist will be maintained and updated by the community.
To assist with the assessment, we have created a fillable Google sheet template. You can make a copy of the Google sheet for your assessment. Each dataset should be assessed separately as the checklist is designed for individual datasets. More effort is ongoing to address the need for linked datasets.
If you are in the early stages of developing AI/ML applications with open environmental datasets. We encourage you to assess the input data used for your applications. Although you may not have the ability to change other people’s datasets, this will help you document the effort spent on preparing the dataset for your development.
If you have any questions or suggestions related to the checklist and the assessment tool, you can provide feedback following the two options listed below:
- Contact Douglas Rao (douglas.rao@noaa.gov), cluster chair
- Open an issue in this GitHub repo.
ESIP Data Readiness Cluster (2023). Checklist to Examine AI-readiness for Open Environmental Datasets. Version 1.0. Earth Science Information Partners. https://github.com/ESIPFed/data-readiness [date accessed].
- Mills, A. (2022) Are Your Data Ready? Take Stock with ESIP’s New AI-Ready Checklist. Earth Science Information Partners [Retrieved on 2023-10-13].
- Long, S. and Romanoff, T. (2023). AI-Ready Open Data. Bipartisan Policy Center. [Retrieved on 2023-10-13].