Fractional Skipping: Towards Finer-Grained Dynamic CNN inference [PDF]
Jianghao Shen, Yonggan Fu, Yue Wang, Pengfei Xu, Zhangyang Wang, Yingyan Lin
In AAAI 2020.
We present DFS (Dynamic Fractional Skipping), a dynamic inference framework that extends binary layer skipping options with "fractional skipping" ability - by quantizing the layer weights and activations into different bitwidths.
Highlights:
- Novel integration of two CNN inference mindsets: dynamic layer skipping and static quantization
- Introduced input-adaptive quantization at inference for the first time
- Better performance and computational cost tradeoff than SkipNet and other relevant competitors
Figure 6: Comparing the accuracy vs. computation percentage of DFS-ResNet74 and SkipNet74 on CIFAR10.
Figure1. An illustration of the DFS framework, where C1, C2, C3 denote three consecutive convolution layers, each of which consists of a column of filters as represented using cuboids. For each layer, the decision is computed by the corresponding gating network denoted with "Gx". In this example, the first conv layer is executed fractionally with a low bitwidth, the second layer is fully executed, while the third one is skipped.
Figure 2. An illustration of the RNN gate used in DFS. The output is a skipping probability vector, where the green arrows denote the layer skip options (skip/keep), and the blue arrows represent the quantization options. During inference, the skip/keep/quantization options corresponding to the largest vector element will be selected and to be executed.
- Ubuntu
- Python 3
- NVIDIA GPU + CUDA cuDNN
- Clone this repo:
git clone https://github.com/Torment123/DFS.git
cd DFS
- Install dependencies
pip install requirements.txt
- Work flow: pretrain the ResNet backbone → train gate → train DFS
0. Data Preparation
data.py
includes the data preparation for the CIFAR-10 and CIFAR-100 datasets.
1. Pretrain the ResNet backbone We first train a base ResNet model in preparation for further DFS training stage.
CUDA_VISIBLE_DEVICES=0 python3 train_base.py train cifar10_resnet_38 --dataset cifar10 --save-folder save_checkpoints/backbone
2. Train gate We then add RNN gate to the pretrained ResNet. Fix the parameters of ResNet, only train the RNN gate to reach zero skip ratio. set minimum = 100, lr = 0.01, iters=2000
CUDA_VISIBLE_DEVICES=0 python3 train_sp_integrate_dynamic_quantization_initial.py train cifar10_rnn_gate_38 --minimum 100 --lr 0.01 --resume save_checkpoints/backbone/model_best.pth.tar --iters 2000--save-folder save_checkpoints/full_execution
3. Train DFS After the gate is trained to reach full execution, we then unfreeze the backbone's parameters and jointly train it with the gate for our specified skip ratio. Set minimum = specified computation percentage, lr = 0.01.
CUDA_VISIBLE_DEVICES=0 python3 train_sp_integrate_dynamic_quantization.py train cifar10_rnn_gate_38 --minimum _specified_ _computation_ _percentage_ --lr 0.01 --resume save_checkpoints/full_execution/checkpoint_latest.pth.tar --save-folder save_checkpoints/DFS
- The sequential formulation of dynamic inference problem from SkipNet
- The quantization function from Scalable Methods