Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specification of requirements for AI/machine learning #130

Open
christer-io opened this issue Jun 16, 2022 · 2 comments
Open

Specification of requirements for AI/machine learning #130

christer-io opened this issue Jun 16, 2022 · 2 comments
Assignees

Comments

@christer-io
Copy link
Collaborator

There are some very interesting projects around open AI under development and I think it would be good to run a process to add some more specific requirements around AI. I have noted some points to frame a possible discussion and process:

  • What licenses are required for AI algorithms (needs to be added to indicator 2), models and datasets.
  • Should we specify that there could different licenses on source code/algorithms and training data? One could argue that this is covered in the standard in its current form, in indicator 2 with reference to open source and open data?
  • Would we require the original training data to be openly licensed for an AI-model to be accepted as a DPG?
  • Can a dataset that is created based on non-open AI/ML be licensed as open data?
  • Regarding indicator 9. Can “no harm” in one context be “harm” in another context when for example ML classification is based on the same training data, but used on a different dataset?
@nathanbaleeta
Copy link
Contributor

nathanbaleeta commented Jun 16, 2022

@christer-io Thanks for sharing your thoughts on the open AI requirements as per the current version of the DPG standard. Kindly is an good example of a DPG which tackled some of the concerns raised as illustrated below:

  • Explicitly indicated licenses for software and data
  • It may not be mandatory to openly license all the original training data considering some datasets may be accessed online from various sources. For instance an ML engineer could use transfer learning to speed up the optimization results of a convolution model using VGG16 which has already been extensively trained and open sourced. However, we should make it mandatory to open source the algorithm to be able to reproduce similar performance results in future iterations.
  • Kindly was pre-built from Hugging Face (which offers both open-source platform and subscription-based features that NLP practitioners can deploy in their models.)

PS: For more information on Kindly, please refer to the documentation here.

@prajectory prajectory self-assigned this Jun 29, 2022
@prajectory
Copy link
Contributor

Please look at #58 while solving for this. Related issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Council
Development

No branches or pull requests

3 participants