diff --git a/.gitignore b/.gitignore index 0ed3b9a..24bf347 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,5 @@ Train/ *.h5 logs/ -*.zip \ No newline at end of file +*.zip +*.mp4 diff --git a/README.md b/README.md index db2898b..93d6e57 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,17 @@ The [lyft Perception challenge](https://www.udacity.com/lyft-challenge) in assoc ## Approach Although it was a segmentation problem and did not require instance segmentation, I went ahead with [MASK-RCNN](https://arxiv.org/pdf/1703.06870.pdf) as it was the state of the art algorithm in image segmentation and I was always intrigued to learn about it. Also I started on *28th*, just after finishing my first term, so transfer learning was my only shot. :sweat: + +*Click to Watch on youtube* + + +
+ + + +
+ + #### Mask-RCNN (A brief overview) Mask-RCNN, also known as [Detectron](https://github.com/facebookresearch/Detectron) is a research platform for object detection developed by facebookresearch. It is mainly a modification of Faster RCNN with a segmentation branch parallel to class predictor and bounding box regressor. The vanilla ResNet is used in an FPN setting as a backbone to Faster RCNN so that features can be extracted at multiple levels of the feature pyramid @@ -66,7 +77,7 @@ def process_labels(self,labels): ``` #### MaskRCNN Configuration -For this application Resnet-50 was used by setting `BACKBONE = "resnet50"` in config. +For this application Resnet-50 was used by setting `BACKBONE = "resnet50"` in config. `NUM_CLASSES = 1+2` for 2 classes (car,road) and `IMAGE_MAX_DIM = 1024` as image dimensions are 800x600. ```python class LyftChallengeConfig(Config): diff --git a/assets/final.gif b/assets/final.gif new file mode 100644 index 0000000..ff72c6e Binary files /dev/null and b/assets/final.gif differ diff --git a/inference.py b/inference.py index 28a346f..efe143c 100644 --- a/inference.py +++ b/inference.py @@ -27,6 +27,7 @@ import mrcnn.model as modellib from mrcnn import visualize from mrcnn.model import log +from moviepy.editor import VideoFileClip @@ -165,21 +166,11 @@ def segment_images_batch(original_images): except KeyboardInterrupt as e: break -# video_len = video.shape[0] -# offset=1 -# a = time.time() -# for idx in range(0,video_len,4): -# # if video_len-idx >4: -# # offset = 4 -# # else: -# # offset = video_len-idx -# # model.config.IMAGES_PER_GPU = offset -# rgb_frames = video[idx:idx+offset,:,:,:] -# print(rgb_frames.shape) -# final_img = segment_images_batch(rgb_frames) -# b=time.time() -# _secs = (b-a)%60 - -# print("FPS:",video_len/_secs) +def process_video(INPUT_FILE,OUTPUT_FILE): + video = VideoFileClip(INPUT_FILE) + processed_video = video.fl_image(segment_images) + processed_video.write_videofile(OUTPUT_FILE, audio=False) + +process_video('./test_video.mp4','./output_video.mp4') exit() \ No newline at end of file