[Proposal] Retrofitting the standard for an open AI audience #58

nathanbaleeta · 2021-03-02T18:41:53Z

The current standard attempts to address the needs for indicator 6 (Mechanism for Extracting Data) better for a software context than AI. There is still room for being more explicit. This issue seeks to outline key questions of concern regarding open ai models and data extraction mechanisms for non-personally identifiable information:

In the context of an open AI model, what qualifies to become non-personally identifiable information?

Model weights & parameters etc.

Describe the mechanism for extracting or importing non-personally identifiable information from the system in a non-proprietary format.(The answers below are my thoughts on what possible answers this question would attract in an AI context)
Model persistence or serialization can occur through:

For Scikit learn, save the model using Pickle (standard Python objects) or Joblib (efficiently serializing Python objects with NumPy arrays).
For Keras and Tensorflow, save the model in HDF5 format with .h5 extension.
For PyTorch, conventional approaches include Pickle, using either a .pt or .pth file extension etc.

PS: While for software the current wording in indicator 6 suffices, for AI models including the following keywords would make it clearer: model persistence/ serialization.

The text was updated successfully, but these errors were encountered:

Lucyeoh · 2021-04-15T14:09:54Z

Status & Next Steps:

Engage an AI expert to take a look at the current standard and this proposal (specific to non PII data & data privacy).

Lucyeoh · 2021-04-15T14:11:04Z

Prioritization: should come after #59

prajectory · 2022-05-12T08:05:51Z

We have expert inputs from Lea Gimpel on how the DPG standard can be better retrofitted for Open AI digital solutions.

Reproducibility:
Means that all training details are given: Needless to say, this includes a description of the data, code documentation and tech stack documentation (these can follow the already existing standards and criteria). We think it should also include specific model-training documentation. For instance, what kind of CPU/GPU, OS and platforms (cloud provider, google collab, etc.) were used for the training and listing all training parameters. Ideally, a tech-savvy person should be able to re-train the model with identical evaluation scores, given all information, data and computing power.

A nice way to think about transparent documentation of AI models is also Google’s idea of “model cards” (see here and here the corresponding article; in addition, Timnit Gebru also suggested “datasheets for datasets”, which could be an interesting tool for the discussion around open data as DPG)

Accessibility:
This is quite critical, in our opinion. The model should be easily accessible and usable. A good solution may be the provision of an API. You can send your request and retrieve the prediction outcome through a stable connection in real-time. Here platforms such as Hugging Face are also quite handy since they allow one-liner-code access and usage of trained ML models. (btw they just received 2b$ funding aiming to build the GitHub of Machine Learning)

Interpretability:
We think it is essential that the prediction outcomes of the models are interpretable and understandable, at least through proper documentation and explanation. For traditional ML models, predictions should be accompanied by some sort of intuitive confidence scores. It is a difference if the models predict with 99% confidence or 51% confidence. If such thresholds are set, they need to be clearly stated and explained.
Generally, it should be clear what problem the AI model aims to solve and what realistic outcomes/performance the user can expect.

Independency:
This adds to the point of accessibility. We may also make the model accessible through some sort of package as a collection of modules that can be downloaded and used in a programming language such as python. (pip install our_packaged_model) or as a sub-module in an existing package (this is how it would work if pushed to the Hugging Face model hub). The point of independency is that the dependencies need to follow the same standards as the end-product, but that’s also already outlined in the standard.

prajectory · 2022-06-29T02:38:03Z

We will resolve this on #130 latest on the topic of AI as a part of the standard.

nathanbaleeta changed the title ~~[Proposal Retrofitting the standard for an open AI audience~~ [Proposal] Retrofitting the standard for an open AI audience Mar 4, 2021

prajectory added the next version label Oct 7, 2021

prajectory added the Research Required label Oct 19, 2021

prajectory added Expert/CoP OpenData CoP and removed next version labels Apr 26, 2022

prajectory self-assigned this Apr 26, 2022

prajectory mentioned this issue Jun 29, 2022

Specification of requirements for AI/machine learning #130

Open

prajectory closed this as completed Jun 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Retrofitting the standard for an open AI audience #58

[Proposal] Retrofitting the standard for an open AI audience #58

nathanbaleeta commented Mar 2, 2021 •

edited

Loading

Lucyeoh commented Apr 15, 2021

Lucyeoh commented Apr 15, 2021

prajectory commented May 12, 2022 •

edited by nathanbaleeta

Loading

prajectory commented Jun 29, 2022

[Proposal] Retrofitting the standard for an open AI audience #58

[Proposal] Retrofitting the standard for an open AI audience #58

Comments

nathanbaleeta commented Mar 2, 2021 • edited Loading

Lucyeoh commented Apr 15, 2021

Lucyeoh commented Apr 15, 2021

prajectory commented May 12, 2022 • edited by nathanbaleeta Loading

prajectory commented Jun 29, 2022

nathanbaleeta commented Mar 2, 2021 •

edited

Loading

prajectory commented May 12, 2022 •

edited by nathanbaleeta

Loading