- Contributing to MLCommons
- Setup for Contributing
- Installation
- Docker Workflows
- Submitting PRs
- Testing
We invite everyone to look through our technical documentation and codebase and submit issues and pull requests, e.g. for changes, clarifications, or any bugs you might encounter. If you are interested in contributing to the work of the working group and influence the benchmark's design decisions, please join the weekly meetings and consider becoming a member of the working group.
The best way to contribute to the MLCommons is to get involved with one of our many project communities. You find more information about getting involved with MLCommons here.
Generally we encourage people to become a MLCommons member if they wish to contribute to MLCommons projects, but outside pull requests are very welcome too.
To get started contributing code, you or your organization needs to sign the MLCommons CLA found at the MLC policies page. Once you or your organization has signed the corporate CLA, please fill out this CLA sign up form form to get your specific GitHub handle authorized so that you can start contributing code under the proper license.
MLCommons project work is tracked with issue trackers and pull requests. Modify the project in your own fork and issue a pull request once you want other developers to take a look at what you have done and discuss the proposed changes. Ensure that cla-bot and other checks pass for your Pull requests.
If you want to run containers on GCP VMs or store and retrieve Docker images from the Google Cloud Container Registry, please read ahead. If you'd like to use a Linux VM, you will have to install the correct GPU drivers and the NVIDIA Docker toolkit. We recommmend to use the Deep Learning on Linux image. Further instructions are based on that.
You can use the scripts/cloud-startup.sh
as a startup script for the VM. This will automate the installation of the NVIDIA GPU Drivers and NVIDIA Docker toolkit.
To access the Google Cloud Container Registry, you will have to authenticate to the repository whenever you use Docker. Use the gcloud credential helper as documented here.
If you have not installed the package and dependencies yet see Installation.
To use the development tools such as pytest
or pylint
use the dev
option:
pip3 install -e '.[dev]'
pre-commit install
To get an installation with the requirements for all workloads and development, use the argument [full_dev]
.
We recommend developing in our Docker image to ensure a consistent environment between developing, testing and scoring submissions.
To get started see also:
If you want to maintain or use images stored on our Google Cloud Container Registry read this section. You will have to use an authentication helper to set up permissions to access the repository:
ARTIFACT_REGISTRY_URL=us-central1-docker.pkg.dev
gcloud auth configure-docker $ARTIFACT_REGISTRY_URL
To pull the latest prebuilt image:
docker pull us-central1-docker.pkg.dev/training-algorithms-external/mlcommons-docker-repo/<image_name>
The naming convention for image_name
is algoperf_<framework>_<branch>
.
Currently maintained images on the repository are:
algoperf_jax_main
algoperf_pytorch_main
algoperf_both_main
algoperf_jax_dev
algoperf_pytorch_dev
algoperf_both_dev
To reference the pulled image you will have to use the full image_path
, e.g.
us-central1-docker.pkg.dev/training-algorithms-external/mlcommons-docker-repo/algoperf_jax_main
.
To build and push all images (pytorch
, jax
, both
) on maintained branches (dev
, main
).
bash docker/build_docker_images.sh -b <branch>
You can also use the above script to build images from a different branch.
-
Push the branch to
mlcommons/algorithmic-efficiency
repository. -
Run
bash docker/build_docker_images.sh -b <branch>
The Docker entrypoint script can transfer data to and from our GCP buckets on our internal GCP project. If you are an approved contributor you can get access to these resources to automatically download the datasets and upload experiment results.
You can use these features by setting the --internal_contributor
flag to 'true' for the Docker entrypoint script.
To run a docker container that will only download data (if not found on host)
docker run -t -d \
-v $HOME/data/:/data/ \
-v $HOME/experiment_runs/:/experiment_runs \
-v $HOME/experiment_runs/logs:/logs \
--gpus all \
--ipc=host \
<image_path> \
--dataset <dataset> \
--framework <framework> \
--keep_container_alive <keep_container_alive> \
--internal_contributor true
If keep_container_alive
is true
the main process on the container will persist after finishing the data download.
This run command is useful if you are developing or debugging.
If you set the internal collaborator mode to true
experiments will also be automatically uploaded to our GCP bucket under gs://mlcommons-runs/<experiment_name
.
Command format
docker run -t -d \
-v $HOME/data/:/data/ \
-v $HOME/experiment_runs/:/experiment_runs \
-v $HOME/experiment_runs/logs:/logs \
--gpus all \
--ipc=host \
<image_path> \
--dataset <dataset> \
--framework <framework> \
--sumbission_path <submission_path> \
--tuning_search_space <tuning_search_space> \
--experiment_name <experiment_name> \
--workload <workload> \
--keep_container_alive <keep_container_alive>
--internal_contributor true \
To find the container IDs of running containers
docker ps
To see the logging output
docker logs <container_id>
To enter a bash session in the container
docker exec -it <container_id> /bin/bash
Rebuilding the docker image can become tedious if
you are making frequent changes to the code.
To have changes in your local copy of the algorithmic-efficiency repo be reflected inside the container you can mount the local repository with the -v
flag.
docker run -t -d \
-v $HOME/data/:/data/ \
-v $HOME/experiment_runs/:/experiment_runs \
-v $HOME/experiment_runs/logs:/logs \
-v $HOME/algorithmic-efficiency:/algorithmic-efficiency \
--gpus all \
--ipc=host \
<image_path> \
--keep_container_alive true
New PRs will be merged on the dev branch by default, given that they pass the presubmits.
We run tests with GitHub Actions, configured in the .github/workflows folder.
We run yapf and linting tests on PRs. You can view and fix offending errors with these instructions.
To run the below commands, use the versions installed via pip install -e '.[dev]'
.
To automatically fix formatting errors, run the following (WARNING: this will edit your code, so it is suggested to make a git commit first!):
yapf -i -r -vv -p algorithmic_efficiency datasets prize_qualification_baselines reference_algorithms tests *.py
To sort all import orderings, run the following:
isort .
To just print out all offending import orderings, run the following:
isort . --check --diff
To print out all offending pylint issues, run the following:
pylint algorithmic_efficiency
pylint datasets
pylint prize_qualification_baselines
pylint reference_algorithms
pylint submission_runner.py
pylint tests
We run unit tests and integration tests as part of the of github actions as well.
You can also use python tests/reference_algorithm_tests.py
to run a single model update and two model evals for each workload using the reference algorithm in reference_algorithms/target_setting_algorithms/
.
We also have regression tests available in .github/workflows/regression_tests.yml that can be run semi-automatically.
The regression tests are shorter end-to-end submissions run in a containerized environment across all 8 workloads, in both the JAX and PyTorch frameworks.
The regression tests run on self-hosted runners and are triggered for pull requests that target the main branch. Typically these PRs will be from the dev
branch
so the tests will run containers based on images build from the dev
branch.
To run a regression test:
-
Build and upload latest Docker images from dev branch.
bash ~/algorithmic-efficiency/docker/build_docker_images.sh -b dev
-
Turn on the self-hosted runner.
-
Run the self-hosted runner application for the runner to accept jobs.
-
Open a pull request into mian to trigger the workflow.