You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This paper covers advanced topics on making deep neural networks more efficient and robust by enhancing architectural efficiency, optimization, label manipulations, and learning rate techniques.
Baseline models have been built by progressively reducing the number of parameters using Depth wise convolution and GAP both in case of MNIST and CIFAR-10.
MNIST Architectures
CIFAR-10 Architectures
Efficiency of DW convolutions / GAP on Accuracy / inference time
SOTA Accuracy of 98.35% with 1.5K params on MNIST dataset
Accuracy of 79.9% with 140K params on CIFAR-10 dataset
No direct effect on DW convs on latency. DW models with same number of params perform SLOWER than models with 3x3 convs.
inference time is proportional to the number of parameters. DW models with lesser params show a decrease in inference time than models with higher params
DW on Accuracy
DW on inference time
Efficiency of Mosaic ML techniques
Blurpool is the most efficient technique with SOTA 99.21% with 1.5K params / Combination resulted in no significant increase in accuracy
Combination of BP + CO + M + LS + SWA + SAM resulted in a SOTA accuracy of 86.76% with 140K params (6.865 increase) / In isolation Mixup performed better with 3.62% increase to 83.52%
No direct effect on DW convs on latency. DW models with same number of params perform SLOWER than models with 3x3 convs.
inference time is proportional to the number of parameters. DW models with lesser params show a decrease in inference time than models with higher params
Isolated techniques on Accuracy
Combined techniques on Accuracy
Take Home!
These techniques may be applied on other standard and custom datasets to establish the superioirty of these model enhancement techniques.