This is a collection of papers aiming at reducing model sizes or the ASIC/FPGA accelerator for Machine Learning, especially deep neural network related applications. (Inspired by Embedded-Neural-Network.)
You can use the following materials as your entrypoint:
- Efficient Processing of Deep Neural Networks: A Tutorial and Survey
- the related work of Quantized Neural Networks
- Structural pruning (compression): compress CNNs based on removing "less important" filter.
Deep neural networks are robust to weight binarization and other non-linear distortions showed that DNN can be robust to more than just weight binarization.
- Fixed point
- Dynamic fixed point
- Binary Quantization
- Theory proof (EBP)
- [1405]. Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights
- [1503]. Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation
- [1505]. Backpropagation for Energy-Efficient Neuromorphic Computing
- More practice with 1 bit
- [1511]. BinaryConnect: Training Deep Neural Networks with binary weights during propagations
- [1510]. Neural Networks with Few Multiplications
- [1601]. Bitwise Neural Networks
- [1602]. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- [1603]. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- XNOR-Net with slightly large bits (1~2 bit)
- [1606]. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
- [1608]. Recurrent Neural Networks With Limited Numerical Precision
- [1609]. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. (Text overlap with Binarized Neural Network.)
- [1702]. Deep Learning with Low Precision by Half-wave Gaussian Quantization
- Theory proof (EBP)
- Ternary Quantization
- Other Quantization or others
- 1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs
- Towards the Limit of Network Quantization.
- Loss-aware Binarization of Deep Networks.
- Log Domain Quantization
- Parameter Sharing
- Structured Matrices
- Structured Convolution Matrices for Energy-efficient Deep learning.
- Structured Transforms for Small-Footprint Deep Learning.
- An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections.
- Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank.
- Hashing
- [1504]. Compressing neural networks with the hashing trick
- Functional Hashing for Compressing Neural Networks
- [1510]. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding
- Learning compact recurrent neural networks.
- Structured Matrices
- To be updated.
Network Prune: a large amount of the weights in a network are redundant and can be removed (i.e., set to zero).
- Remove low saliency
- Energy-based prune
- Process sparse weights
- [1402]. A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs
- [1510]. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding
- [1602]. EIE: Efficient Inference Engine on Compressed Deep Neural Network
- [1705]. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks
- [1710]. Efficient Methods and Hardware for Deep Learning, Ph.D. Thesis
- Structured pruning
- [1711]. Interpreting Convolutional Neural Networks Through Compression - [notes][arXiv]
- [1705]. Structural compression of convolutional neural networks based on greedy filter pruning - [notes][arXiv]
- Before Training
- use 1*1 convolutional layer to reduce the number of channels
- Bottleneck:
- After Training
- Canonical Polyadic (CP) decomposition
- Tucker decomposition
- [0600]. Model compression
- [1312]. Do deep nets really need to be deep?
- [1412]. Fitnets: Hints for thin deep nets
- [1503]. Distilling the knowledge in a neural network
- Sequence-Level Knowledge Distillation.
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer.