GitHub - parasj/checkmate: Training neural networks in TensorFlow 2.0 with 5x less memory

See the paper! https://arxiv.org/abs/1910.02653

checkmate breaks the GPU memory wall by enabling researchers to train large state-of-the-art models that do not fit in GPU memory. Checkmate applies optimal tensor rematerialization (as detailed in our paper at MLSys 2020) to trade off space and time.

At the moment, Checkmate only supports TensorFlow 2.0. PyTorch support is coming soon!

IF YOU ARE TRYING TO REPLICATE OUR MLSYS 2020 PAPER, USE THE mlsys20_artifact BRANCH!

Installation

Checkmate depends on:

TensorFlow 2.0, i.e. pip install tensorflow or pip install tensorflow-gpu.

CyLP solver

Installing CyLP on Debian Linux / Ubuntu

$ sudo apt install coinor-cbc coinor-libcbc-dev
$ pip install cylp

Installing CyLP on MacOS

The easiest way to set up CyLP is using homebrew.

$ brew tap coin-or-tools/coinor
$ brew install coin-or-tools/coinor/cbc pkg-config
$ pip install cylp

Once TensorFlow 2.0 and CyLP are installed, Checkmate can be installed using pip via pip install "https://github.com/parasj/checkmate/archive/master.zip#egg=checkmate".

Quick start

Get started in 5m with our TF2.0 quickstart tutorial

Adapt your Keras model to fit within the memory constraints of a single GPU:

import checkmate
model = tf.keras.applications.vgg19.VGG19(...)
...

train_iteration_fn = checkmate.tf2.compile(model, loss, optimizer,
    input_spec=sample_input[0], label_spec=sample_input[1])

for image, label in train_ds:
    prediction, loss = train_iteration_fn(image, label)

Key ideas

From our paper at MLSys 2020:

Modern neural networks are increasingly bottlenecked by the limited capacity of on-device
GPU memory. Prior work explores dropping activations as a strategy to scale to larger
neural networks under memory constraints. However, these heuristics assume uniform
per-layer costs and are limited to simple architectures with linear graphs, limiting their
usability. In this paper, we formalize the problem of trading-off DNN training time and
memory requirements as the tensor rematerialization optimization problem, a generalization
of prior checkpointing strategies. We introduce Checkmate, a system that solves for
optimal schedules in reasonable times (under an hour) using off-the-shelf MILP solvers,
then uses these schedules to accelerate millions of training iterations. Our method scales
to complex, realistic architectures and is hardware-aware through the use of
accelerator-specific, profile-based cost models. In addition to reducing training cost,
Checkmate enables real-world networks to be trained with up to 5.1× larger input sizes.

Citation

If you use Checkmate in your work, please cite us with:

@incollection{mlsys2020_196,
 author = {Jain, Paras and Jain, Ajay and Nrusimha, Aniruddha and Gholami, Amir and Abbeel, Pieter and Gonzalez, Joseph and Keutzer, Kurt and Stoica, Ion},
 booktitle = {Proceedings of Machine Learning and Systems 2020},
 pages = {497--511},
 title = {Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization},
 year = {2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github/workflows		.github/workflows
checkmate		checkmate
scripts		scripts
tests		tests
tutorials		tutorials
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Quick start

Key ideas

Citation

About

Releases

Packages

Contributors 5

Languages

License

parasj/checkmate

Folders and files

Latest commit

History

Repository files navigation

Installation

Quick start

Key ideas

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages