GitHub - shulavkarki/YOLO-v2-with-Pytorch: Object Detection using YOLO v2 with pytroch.

YOLO-v1-with-Pytorch

Overall Structure of YOLO v1 Network:

Use Convolution Network as feature extraction.
Use fully connected layers to predict output probabilites and coordinates.

Let's break each one of em.

1. Convolution Netork

In the yolo paper, the author pretrain on 20 convolution layers on the ImageNet dataset.
Here the convolution layers is used as feature extraction.

Obviously, i'm not gonna pretain in the imagenet. coz, it gonna take up to lot of time to train(may be weeks). Nonetheless, one can use any pretrained cnn network like resnet, or others mentioned in paper.

However.., here i've used custom architecture (Archi.config in repo) to train image classifier in custom dataset (pizza vs sandwich dataset.)
If you've noticed the architecture, i've used the higher convolution followed by lower number of channels convolution, because it reduces the amount of computation and improves the non-linearity of the model. Result of Image Classifier:

Optimizer	Epoch	Learning Rate	Training Accuracy	Testing Accuracy
SGD Gradient Descent	50	0.0001	93.50%	91.41%

The trained model is in ./Saved Models/ folder. You can pretrain CNN Network in your own dataset.
For training and testing , you can find in classifer.ipynb file within the repo.
Also, you can checkout this repo.

3. FC Layers for prediction

Yolov1 frame object detection as a regression problem to spatially seperated bounding box and associated class probabilites.
For the last convolution layer, it outputs a tensor shapeed (7, 7, 1024). Then the tensor expands using 2 FC layers as a form of linear regression. It outputs parameters and then reshapes into (7, 7, 30).

Now let's look at the workflow or how the model gets trained.

In the paper, the image is divided into S*S grid(virtually). The author has taken S = 7.

2.The output is SS(5B+C).
Since, S=7. The image is alltogether divided into 77 grid.
So, for each grid, the size of output is 5B+C.
Terms:
B = Bounding Box
C = Proababilies of each class
If we consider B=1, and C = 'n' class., then for each grid 1 bouding box is predicted. It looks something like this. If we consider B=2 then, This means, each grid is going to predict 2 bounding box which is defined by (x, y, w, h)which are center, width and height of bounding box.
Therefore, the output is the flatten of size S**S(5B+C). The 30 in the fully connected layer is the (5B+C), where the author considers B=2 and C=20 classes(can predict upto 20 classes)

Loss Function

source

Implementation

However in this repo, i've used yolov3.(but doesn't contain pipeline for three scaled images.s.) Unlike in the yolov1 where the final year consists of the regressor, here CNN is used in the final layer.

Consideration:

Classifier is used as Feature Selection/Feature Extraction.
Extra layer is added to the classifier to get the output CNN.
The output CNN should be in the dimension of SSC. where S: Grid size, and C= Channel.
Here the S=13. Meaning the image is divided into 13 by 13 grid and the output consists of 13*13 height and weight and C channle.
Here the C=7. Meaning the ouput will have 7 channel. [1st chnnel:Confidence Score, 2nd to 5th channel: x, y, w, h and 6th to 7th channel consists of probability score of given object falls in particular class.]
Here (x, y): Center of the bounding box. (w, h): Dimension of bouding box.
Since, here i've considered only two class. So, only two channel after x,y,w and h.

Architecture:

Classifier Netwwork
Object Detection Network

Actual Volume Interpretation:

Loss Function

The Yolov1 loss function is used in this implementation.

Yolo Loss Function looks something like this:

Let's break each of em.

Bounding Box Coordinate Loss/ Regression Loss.

YOLO predicts multiple bounding boxes per grid cell. At training time we only want one bounding box predictor to be responsible for each object.
We assign one predictor to be “responsible” for predicting an object based on which prediction has the highest current IOU with the ground truth.
However, i've only taken one bounding box per grid cell.

(x_i, y_i, w_i, h_i): Network prediction of center ,height and width of the responsible bounding box in the i'th grid box
(x^{^}_i, y^{^}_i, w^{^}_i, h^{^}_i): Ground truth

Confidence Loss

C_i: True obectness score. C_i^{^}: Prediction from network * Iou between ground truth and predicted volume.

Classificaiton Loss

Limitations of Yolo

Comparatively low recall and more localization error compared to Faster R_CNN.
Struggles to detect close objects because each grid can propose only 2 bounding boxes.
Struggles to detect small objects.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Architecture		Architecture
Datasets		Datasets
Saved Models		Saved Models
image_with_nms		image_with_nms
image_without_nms		image_without_nms
.gitattributes		.gitattributes
Classifier.ipynb		Classifier.ipynb
Data Augmentation.ipynb		Data Augmentation.ipynb
README.md		README.md
anchor_object_detection.ipynb		anchor_object_detection.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YOLO-v1-with-Pytorch

Overall Structure of YOLO v1 Network:

Let's break each one of em.

1. Convolution Netork

3. FC Layers for prediction

Now let's look at the workflow or how the model gets trained.

Loss Function

Implementation

Consideration:

Architecture:

Loss Function

Yolo Loss Function looks something like this:

Limitations of Yolo

About

Releases

Packages

Languages

shulavkarki/YOLO-v2-with-Pytorch

Folders and files

Latest commit

History

Repository files navigation

YOLO-v1-with-Pytorch

Overall Structure of YOLO v1 Network:

Let's break each one of em.

1. Convolution Netork

3. FC Layers for prediction

Now let's look at the workflow or how the model gets trained.

Loss Function

Implementation

Consideration:

Architecture:

Loss Function

Yolo Loss Function looks something like this:

Limitations of Yolo

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages