description |
---|
Reduce memory cost to store intermediate results and gradients. |
URL: https://arxiv.org/abs/1604.06174
Authors: Tianqi Chen (UW), Bing Xu (Dato. Inc), Chiyuan Zhang (MIT), Carlos Guestrin (UW).
- Original MXNet Implementation: https://github.com/dmlc/mxnet-memonger
- OpenAI's TensorFlow Implementation: https://github.com/cybertronai/gradient-checkpointing
- PyTorch Implementation: https://github.com/Lyken17/pytorch-memonger
How to reduce the memory consumption of DNN training (to enable bigger models or larger batch size)?
- Mainly focus on reducing the memory cost to store intermediate results (feature maps) and gradients.
- Design an algorithm to trade computation for memory. O(√n) memory cost with one extra forward computation per mini-batch.
- Inplace operation: directly store the output values to memory of a input value.
- Memory sharing: memory used by intermediate results that are no longer needed can be recycled and used in another node.
- Re-computation: drop the results of low cost operations and re-compute the dropped intermediate results.
- Enable option to drop result of low cost operations.
- Provide planning algorithms to give efficient memory plan.
- Enable user to set the mirror attribute (how many times a result can be recomputed) in the computation graph for memory optimization.