Skip to content

Computer Vision: State of the Art model implementation using PyTorch framework.

License

Notifications You must be signed in to change notification settings

blackpython890/TSAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open Source Love

Applied Deep Learning : Convolution Neural Network

Python  PyTorch  Maintained  License  LAST_COMMIT  Contributors  

What this REPO is about

With the advancement in the field of GPU and deep learning, the task of the model to classify the images,object detection etc is now something we can acheive with relatively higher accuracy and in real-time. And with this branch of Artifical Intelligence Computer Vision started to evolve. One of the driving factors behind the growth of Computer Vision field is the amount of data(images , videos ) we generate today is sufficient to trian the Convolution Neural Network (CNN) and make Computer Vision better.

This repository contains my personal exploration and research on Convolution Neural Network , disciplined way to learn and implement the fundamentals of State Of the Art models using PyTorch library.


Lets Get Started

Prerequisite

  • Python
  • Knowledge of Image( heights , width , pixels , channels)
  • PyTorch
  • Basics of OpenCV , Python Image Library(PIL) , matplotlib.

Hardware Requirement

  • GPU : Tesla T4/Tesla K8 or higher versions.
  • GPU count : 1,2
  • RAM - 12GB or higher

Models Used

  • Custom Deep Neural Network
  • ResNet
  • DenseNet
  • GoogleNet

Case Study

  • Image Classification.
  • Object Detection using YOLO.
  • Monocular Depth Estimation.
  • Object Segmentation.
  • Human Pose Estimation.
  • GAN's

Dataset Used

  • MNIST
  • CIFAR10
  • Custom Dataset for Object Detection.
  • Tiny ImageNet
  • Coco
  • ImageNet

1. ML Intuition and Basics of CNN

Basics of python can be learnt on YouTube. Channels like Corey Shagffer YouTubeLogo and Telusko YouTubeLogo helped me a lot to learn about python basics.

Basics of CNN , how CNN learns , how different channels are formed , how DNN make sense of the inputs it gets ( Features -> Edges & Gradients -> Textures -> Patterns -> Part of Objects -> Objects -> Scenes )Please see below. Resemblance of Human brain , eyes with computer vision field.

2. CNN Architecture

Basic CNN Architecture , maintain symmetry by chosing odd size kernel(Example : 3X3 , 5X5), importance of choosing 3X3 kernel over 5X5 or higher odd kernel , Max-Pooling , Receptive Field. Below image represents convolution from 5x5-3x3-1x1 and receptive field increase from left to right as convolution occurs or layers increases.

3. Kernels and Convolution

Basic Pytorch architecture for working with neural networks, introduction to nn.Module, optimizers, forward and backward pass, datasets, how to apply simple augmentation.

4. Architecture Basics

CNN Architecture components Fully Connected Layer , Drop-Out , Softmax , Learning-Rate , Batch-Size.

Work link Summary :

  • Train MNIST Dataset to get 99.40% accuracy with given contraint. Kindly check the worklink to know more.
  • Parameters :
  • Epoch : 20
  • Learning Rate
  • Batch Size
  • Highest Accuracy -
  • Work Link

Fully Connected layer(FC) vs Drop-Out vs Learning Rate is shown below respectively.

5. Model Implementation

Step by step approach to build neural network , debugg , and to optimize to get the best accuracy. Kindly check worklink to know more.

Work link Summary :

  • Train MNIST Dataset to get 99.40% accuracy with given contraint. Kindly check the worklink to know more.
  • Parameters :
  • Epoch : 15
  • Learning Rate :
  • Batch Size :
  • Highest Accuracy -
  • Work Link
6. Batch Normalization and Regularization

Importance of Normalization , Batch normalization , Regularization of Datasets. Thin line difference between normalization and equalization.

Work link Summary :

  • Train MNIST Dataset to get 99.40% accuracy with contraint and add regularization to it.Kindly check the worklink to know more.
  • Parameters :
  • Epoch : 15
  • Learning Rate :
  • Batch Size :
  • Highest Accuracy -
  • Work Link

Original Data Mean vs Normalized Data mean is shown below recpectively.

7. Advanced Convolutions

Different Types of convolution like Normal Convoultion, Dilated Convolutions, Pointwise Convolution(1x1), DECONVOLUTION or Fractionally Strided OR Transpose Convolution, Pixel Shuffle Algorithm, Depthwise Separable Convolution, Grouped Convolution. Dilated, Depthwise , Grouped is shown below respectively.

Work link Summary :

  • Train CIFAR10 Dataset to get more that 80% accuracy with contraints.Kindly check the worklink to know more.
  • Parameters :
  • Epoch : 15
  • Learning Rate :
  • Batch Size :
  • Highest Accuracy -
  • Work Link

Dilated Convolution vs Depthwise vs Group Convolution is shown below respectively.

8. Receptive Fields and dfferent Netwwork Architecture

Introduction to different neural network architecture like AlexNet , VGG , ResNet, GoogleNet, Inception, ResNext. Different Version of it. Importance of having multiple Receptive field.

Work link Summary :

  • Train CIFAR10 Dataset to get more that 85% accuracy using ResNet-18 architecture. Kindly check the worklink to know more.
  • Model :
  • Epoch : 15
  • Learning Rate :
  • Batch Size :
  • Highest Accuracy -
  • Work Link

Comparison of architecture like AlexNet, VGGNet, ResNet is shown below.

9. Data Augmentation/Model Diagnostics

One of the easy way to increase accuracy is to increase the receptive field(core idea of ResNet architecturec). One of the way also include regularization like DropOut , Batch Normalization , L1/L2 Regularization. All the above topic will fall short if the dataset is limited. And to tackle this we can use Data Augmentation strategy. Please see some the strategy mentiond images.

Work link Summary :

Just have a look at different data augmentation strategy.

10. Advanced Training

LR Finder. This need to update. Work Link

11. Super-Convergence

Implementation of phenomenon( Super-Convergence/One Cycle Policy) where a neural network can be trained on a faster magnitude than a standard training without hampering accuracy of the model.This is the implementation of reasearch paper discussed here. An intuition to implement this is that large learning rates regularize the training, hence requiring a reduction of all other forms of regularization in order to preserve the optimal balance.

Work link Summary :

  • Implement one-cycle policy along with data-augmentation strategy ad show GRADCAM module. And train the CIFAR10 dataset to achieve 90%+ accuracy. Kindly check the worklink to know more.
  • Model :
  • Epoch : 15
  • Learning Rate :
  • Batch Size :
  • Highest Accuracy -
  • Work Link

One Cycle Minima , Test accuracy to show significance of Super-convergnece.

12. Object Localization

Difference between Image classification and Image localization ( aka Image/Object Detection ). Detection approaches like Sliding window alogorithm, Regional propasal algorithms, Anchor box,shown below respectively. Pros and Cons of different approaches. Detailed study of latest approach anchor box - IOU ( Intersection over Union ), MAP ( Mean Aeverage Precision ), centriods, K-Means algorithms to compute centroids. Understanding YOLO-V2 loss function.

Work link Summary :

  • Train Tiny-ImageNet on ResNet-18 within contraint to acheive 50%+ accuracy. worklink to know more.
  • Model : ResNet-18
  • Epoch : 50
  • Learning Rate :
  • Batch Size :
  • Highest Accuracy -
  • Work Link

Lets have a visualization of Sliding Window algorithm , Regional Proposal, Anchor Box.

13. YOLO 2 & 3

Introduction to YOLO and why is it called YOLO ? FPS of YOLO. Anchor Box variation on datasets.

Work link Summary :

  • Use OpenCV to detect COCO Dataset Objects. Collect a custom dataset of 500 Images and detection by YOLO. worklink to know more.
  • Model : ResNet-18
  • Epoch : 50
  • Learning Rate :
  • Batch Size :
  • Highest Accuracy -
  • Work Link
  • Object Detection Youtube Video - YouTube

14. RCNN

Introduction to RCNN family. RCNN family find it's root in Selective Search for Object Recognition - SSOR and Efficient Graph based Image Segmentation - EGIS. SSOR uses EGIS to create initial regions and then uses greedy algorithm to form categorize similar groups. And with the help of color channel , Image segmentation and Classification is done. Popular architecture are using SSOR and EGIS like Region with CNN features also knows as R-CNN , Fast R-CNN, Faster R-CNN where each one the architecture remove cons of previous one respectively. Now interestingly, we can add two additional convulation layer to build Mask R-CNN from Faster R-CNN architecture. Both the architecture as shown below Faster R-CNN vs Mask RCNN.

Work link Summary :

  • .
  • Model : ResNet-18
  • Epoch : 50
  • Learning Rate :
  • Batch Size :
  • Highest Accuracy -
  • Work Link

15. Transfer Learning

this need to be updated


License

This project is licensed under the MIT license.

See License for more details.

Reference / Study Materials :


Author Info / Contributors :

  • Email :
  • Linkedin
  • Github