Lyft Perception Challenge

The lyft Perception challenge in association with Udacity had an image segmentation task where the candidates had to submit their algorithm which could segment road and cars pixels precisely in real time. The challenge started on May 1st,2018 and went through June 3rd, 2018.

Approach

Although it was a segmentation problem and did not require instance segmentation, I went ahead with MASK-RCNN as it was the state of the art algorithm in image segmentation and I was always intrigued to learn about it. Also I started on 28th, just after finishing my first term, so transfer learning was my only shot. 😓

Mask-RCNN (A brief overview)

Mask-RCNN, also known as Detectron is a research platform for object detection developed by facebookresearch. It is mainly a modification of Faster RCNN with a segmentation branch parallel to class predictor and bounding box regressor. The vanilla ResNet is used in an FPN setting as a backbone to Faster RCNN so that features can be extracted at multiple levels of the feature pyramid The network heads consists of the Mask branch which predicts the mask and a classification with bounding box regression branch. The architecture with FPN was used for the purpose of this competition

Backbone	Heads

Feature Pyramid network with Resnet	different head architecture with and without FPN

The loss function consists of 3 losses L = L_class + L_box + L_mask where

L_class uses log loss for true classes
L_box uses smooth_L1 loss defined in [Fast RCNN]
L_mask uses average binary cross entropy loss

The masks are predicted by a Fully Connected Network for each RoI. This maintains the mxm dimension for each mask and thus for each instance of the object we get distinct masks.

The model output after compiling the keras model can be found at model

Training

MaskRCNN Configuration

For this application Resnet-50 was used by setting BACKBONE = "resnet50" in config.

Processing Data

Data Augmentation

As the samples provided were very less (1K), data augmentation was necessary to avoid overfitting. imgaug is a python module which came handy in adding augmentation to the dataset

Training Loss

Instead of a single training loop, it was trained multiple times in smaller epochs to observe change in the loss with changes in parameters and to avoid overfitting. As the data was less, the network used to saturate quickly and required more augmentations to proceed. Also i did not wanted to go overboard on the augmentation so was observing which one works best. Below are the logs of final training setting with the above given augmentation.

heads Epoch	all Epoch	loss	val_loss
10	40
40	100
10	40
20	60

Results

Your program runs at 1.703 FPS

Car F score: 0.519 | Car Precision: 0.509 | Car Recall: 0.521 | Road F score: 0.961 | Road Precision: 0.970 | Road Recall: 0.926 | Averaged F score: 0.740

Inference and Submission

Submission

Submission requires files to be encoded in a json. test_inference.py contains the inference and submission code. In attempt to increase the FPS, The encode function was replaced with the follows which was shared on the forum

def encode(array):
    retval, buffer = cv2.imencode('.png', array)
    return base64.b64encode(buffer).decode("utf-8")

Reference

https://github.com/matterport/Mask_RCNN

@misc{Charles2013,
  author = {waleedka et.al},
  title = {Mask R-CNN for Object Detection and Segmentation},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/matterport/Mask_RCNN}},
  commit = {6c9c82f5feaf5d729a72c67f33e139c4bc71399b}
}

Mask RCNN
Fast RCNN
Faster RCNN
Feature Pyramid Networks for Object Detection
Fully Connected Network

Author

Ameya Wagh aywagh@wpi.edu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Lyft Perception Challenge

Approach

Mask-RCNN (A brief overview)

Training

MaskRCNN Configuration

Processing Data

Data Augmentation

Training Loss

Results

Inference and Submission

Submission

Reference

Author

Files

README.md

Latest commit

History

README.md

File metadata and controls

Lyft Perception Challenge

Approach

Mask-RCNN (A brief overview)

Training

MaskRCNN Configuration

Processing Data

Data Augmentation

Training Loss

Results

Inference and Submission

Submission

Reference

Author