Skip to content

sabeesh90/MosaicML_Augmentations_Efficient_Deep_Learning_MLDS_2022

Repository files navigation

Mosaic ML implementation

This paper was presented in MLDS 2022 MLDC_Jan_2022.pdf

MLDS 2022_Presentation_Final.pptx

Authors: Sabeesh Ethiraj, Bharath Kumar Bolla

SOTA Accuracies on MNIST and CIFAR-10

This paper covers advanced topics on making deep neural networks more efficient and robust by enhancing architectural efficiency, optimization, label manipulations, and learning rate techniques.

Architectural Efficiency

  • Depth wise separable convolutions - Parameter Reduction
  • Global average pooling - Paramter reduction
  • Blurpool - Anti Aliasing
  • Squeeze and Excite - Channel Attention
Depthwise Separable Convolutions Squeeze and Excitation Blocks Blurpool
image image image

Weight Space Altlerations

  • Stochastic Weight Averaging

Optimization & Regularization

  • Sharpness Aware minimization
  • Label Smoothing
  • One Cycle LR
Sharpness Aware minimization
image

Augmentations

  • Mixup
  • Cutout
Depthwise Separable Convolutions Squeeze and Excitation Blocks

Baseline models have been built by progressively reducing the number of parameters using Depth wise convolution and GAP both in case of MNIST and CIFAR-10.

MNIST Architectures

image

CIFAR-10 Architectures

image

Efficiency of DW convolutions / GAP on Accuracy / inference time

  • SOTA Accuracy of 98.35% with 1.5K params on MNIST dataset
  • Accuracy of 79.9% with 140K params on CIFAR-10 dataset
  • No direct effect on DW convs on latency. DW models with same number of params perform SLOWER than models with 3x3 convs.
  • inference time is proportional to the number of parameters. DW models with lesser params show a decrease in inference time than models with higher params
DW on Accuracy DW on inference time
image image

Efficiency of Mosaic ML techniques

  • Blurpool is the most efficient technique with SOTA 99.21% with 1.5K params / Combination resulted in no significant increase in accuracy
  • Combination of BP + CO + M + LS + SWA + SAM resulted in a SOTA accuracy of 86.76% with 140K params (6.865 increase) / In isolation Mixup performed better with 3.62% increase to 83.52%
  • No direct effect on DW convs on latency. DW models with same number of params perform SLOWER than models with 3x3 convs.
  • inference time is proportional to the number of parameters. DW models with lesser params show a decrease in inference time than models with higher params
Isolated techniques on Accuracy Combined techniques on Accuracy
image image

Take Home!

These techniques may be applied on other standard and custom datasets to establish the superioirty of these model enhancement techniques.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published