Skip to content

In this project we are going to implement a system which use CNN to detect objects in a picture using SSD algorithm

Notifications You must be signed in to change notification settings

amoazeni75/object-detection-ssd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Object Detection using SSD

In this project we are going to implement a system which use CNN to detect objects in a picture using SSD (Single-Shot MultiBox Detector) algorithm

SSD is good in both speed of detection and accuracy

Goal

Desing and implement a system using SSD algorithm to detect objects in a picture or video real time.

Project Outline

  • Object localization (we will see how to combine classification and regression).
  • Broden our scope from object localiztion to object detection
  • Sliding windows efficient implementation.
  • Problem of scale, we will deal with this problem with caused by the distance of objects in a picture from the point which has taken
  • SSD architecture for industrial usage
  • Modify the SSD algorithm to work on videos.
  • Jaccard index(IoU), non-max suppression

Object Localization

In this concept, we don't just want to know what are in the image, we want to know where they are

To aim this purpose, we need five logistic regression on top of the ResNet for detecting class, x center, y center, height ,and width.

Loss Function

Our loss function includes three parts

  1. Binary Cross Entropy: p(object | image): this part tells us whether or not there is even an object in the image.
  2. Categorical Cross Entropy: p(class 1 | image), p(class 2 | image) ... p(class k | image): this part tells us which class objects belong to
  3. MSE : in this part we have four regression output for bounding box(CX, CY, Height, Width)(should not contribute to loss when there is no object in the image)

Object Detection

This is a generalized version of object localization. In this concept we may have 0 or several objects within an image. The goal is to detect all of them and draw rect around each object.

  • Worth thinking about: what kind of data structures do we need?
  • A CNN must output a fixed set of numbers
  • But an image may have 0 objects, or it may have 50- how can it output the right numbers for all cases?
  • Naive strategy: in a loop
    • Look for object with highest class confidence
    • Output its p(class | image), cx, cy, height, width
    • Erase that object from the image
How can Find Objects in an Image?

Sliding window technique: take some window and for each position in the original image pass this sub-image to the CNN. One of the major problem of this method is its low speed, O(N^2). To solve this problem we would use convolution operation.

SSD: The main concept is that by using CNN we would get same result as sliding window by passing the image through CNN just one time, that's why its name is single-shot. One more advantagous of this algorithm is that there is no need to tell the CNN which regions may have objects

The Problem of Scale

There are objects that may seem very small because of their distance to the camera, how can solve this problem?

The general pattern of CNN is that you go through each layer the image is shirinking and therefore the features you are finding go from small to big. The idea is attach mini-neural network to intermediate layers of a pre-trained network. For each output we will do object detection separately.

The Problem of Shape (Aspect Ratio)

  • Windwo Size: In a picture there are objects with different sizes, for example people are tall and cars are wide, so what size should the window be?
  • We might be looking at a window where both objects might appear in the same window with one occluding the other.
  • Different angle of an object: for example a person may lay down

Solution is: instead of one window, use default boxes in each position, for each rect we try to detect an object by passing it through our CNN

We not only look at the image at multiple scales but we apply each box to each window at each scale

Start Running the Project

  1. Download the tensorflow/models repository: git clone https://github.com/tensorflow/models.git
  2. Start Notebook inside research/object_detection folder
  3. Install Protocol Buffers: (windows) conda install -c anaconda protobuf To ensure about correct installation protoc --version
  4. Run this from the "research folder": protoc object_detection/protos/*.proto --python_out=.
  5. Exmaple command for an image: python main.py --content image --path "./sea.jpg"
  6. Exmaple command for a video: python main.py --content video --path "./traffic.mp4"

Sample Output for Detecting Objects in an Image

sea

Sample Output for Detecting Objects in a Video

Releases

No releases published

Packages

No packages published

Languages