Ease.ml/ci is a library to support continuous testing and integration of machine learning models with statistical guarantees. It can be used as a stand-alone library or deployed as a CI&CD service.
There exist many different CI/CD tools for classical software development (e.g., Jenkins). However, using them out-of-the box for continously testing machine learning models can lead to failures in production. The reason is firstly, that when testing an ML model with a fixed test set, one has to take into account the inherent randomness of ML. Secondly, when using the same test set multiple times, one has to make sure to not overfitt to it, even when only evaluating and having access to the outcome of test conditions. More details about the inherent challenges on how to test ML models can be found in our blog post.
The core component of ease.ml/ci is a sample size estimator. Given a test condition, the number of commits one itends to use the same test set, and the confidence bounds one has to guarantee, the sample size estimator will output the minimum number of samples required to satisfy these requirements. This estimator can then be uses in a standalone fashion (i.e., as a library), or integreated in a CI/CD workflow (i.e., using GitHub action or buildbot). The later requires to also include functionalities on how to actually calculate quantities supported in the test conditions (like accuracy or difference in predictions of models), and how to notify the user to provide a new test set and replace the existing one in the system.
Install the library
pip install git+https://github.com/easeml/ci
within the python kernel with the installed library
from easeml_cicd.core.utils import SampleCalculator
# location of ci&cd config file
config_path=".easeml.yml"
# initialize the sample calculator with the config file
sc=SampleCalculator(config_path)
# Cacluate the number of samples needed
N = sc.calculate_n()
A jupyter notebook showcasing this can be found here
Ease.ml/ci can be used within a GitHub Action.
- Ease.ML/ci repository structure
- Generate dataset encryption and decryption keys, by running the command.
easeml_create_key
- Base64 encode the keys and add them as a repository secret.
cat easeml_pub.asc | base64 -w 0
cat easeml_priv.asc | base64 -w 0
- Store the keys as GitHub Secrets under the names
B64_EASEML_PUB
andB64_EASEML_PRIV
. - Create a GitHub Action yaml under
.github/workflows/
, e.g. easemlci.yml
An example repository using Ease.ml/ci as a GitHub Action can be found here: https://github.com/leaguilar/ci_action
For heavier workloads Ease.ml/ci can be deployed as a service interfacing with a github repository, deploying models as containers with docker, managing the encrypted datasets and notifying users by email the results of their ML CI&CD pipeline. For this buildbot is used as a base and Easeml/CI&CD is used as a plugin
- Ease.ML/ci repository structure
- Buildbot
- Docker (ML model are run as docker containers)
- A public ip or domain name and access configured to the required port, e.g. http://ec2-18-219-109-220.us-east-2.compute.amazonaws.com:8010
A playlist with a detailed example of setting up the service can be found here and the videos are linked throughout the overview
- Provision a server or cluster with a publicly reachable ip/domain name and port,e.g. http://ec2-18-219-109-220.us-east-2.compute.amazonaws.com:8010
- Install Docker and enable execution without sudo, e.g. https://docs.docker.com/engine/install/ubuntu/, https://docs.docker.com/engine/install/linux-postinstall/
- Create buildbot master/worker
- Register a GitHub app and link it to a repository
- Generate the GitHub app's
service_private_key.pem
- Register the webhook's location, e.g. http://ec2-18-222-118-176.us-east-2.compute.amazonaws.com:8010/change_hook/github
- Enable control over check runs
- Structure and configure your GitHub repository to use the GitHub app, e.g. https://github.com/leaguilar/VLDB2019
- Videos: 2.1, 2.2
- Generate the GitHub app's
- Set keys
- GitHub access keys in
$HOME/.easeml/keys/service_private_key.pem
, (this is the key required for the GitHub app to access the repository) - Data decryption and encryption keys:
$HOME/.easeml/keys/easeml_priv.asc
$HOME/.easeml/keys/easeml_pub.asc
- Videos: 3.1
- GitHub access keys in
- Run the service
- Videos: 4.1
@inproceedings{renggli2019mlsys,
author = {Cedric Renggli and Bojan Karlaš and Bolin Ding and Feng Liu and Kevin Schawinski and Wentao Wu and Ce Zhang},
booktitle = {Proceedings of Machine Learning and Systems},
title = {Continuous Integration of Machine Learning Models with ease.ml/ci: A Rigorous Yet Practical Treatment},
year = {2019}
}
@inproceedings{karlas2020sigkdd,
author = {Bojan Karlaš and Matteo Interlandi and Cedric Renggli and Wentao Wu and Ce Zhang and Deepak Mukunthu Iyappan Babu and Jordan Edwards and Chris Lauren and Andy Xu and Markus Weimer},
booktitle = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining},
title = {Building continuous integration services for machine learning},
year = {2020}
}
@inproceedings{aguilar2021ease,
title={Ease. ML: A Lifecycle Management System for Machine Learning},
author={Aguilar Melgar, Leonel and Dao, David and Gan, Shaoduo and G{\"u}rel, Nezihe M and Hollenstein, Nora and Jiang, Jiawei and Karla{\v{s}}, Bojan and Lemmin, Thomas and Li, Tian and Li, Yang and others},
booktitle={11th Annual Conference on Innovative Data Systems Research (CIDR 2021)(virtual)},
year={2021},
organization={CIDR}
}