A Lightweight Human (Person) Segmentation Model built using Autoencoders, trained on COCO.
Model Notebook
·
Report Bug
- Table of Contents
- About The Project
- Jupyter Notebooks - nbViewer
- Dataset Information
- Features
- Results
- How to Run
- Changelog
- Contributing
- License
- Contact
Inspired from UNet (Paper), which is a form of Autoencoder with Skip Connections, I wondered why can't a much shallower network create segmentation masks for a single object? Hence, the birth of this small project.
The primary goal of this is to determine if a shallow end-to-end CNN can learn complicated features like human beings. Hence, as a proof of concept, this notebook has been created.
The notebooks do not render properly on GitHub, hence please use the nbviewer links provided below to see the results.
- Dataset Preparation - Extracting Masks for Person from COCO Dataset
- Model - Main Notebook Containing the Dataset Loader and Model Architecture
- The Model is trained on COCO 2017 Dataset.
- Dataset Splits Used:
- Train: COCO 2017 Train Images + Train Annotations -
instances_train2017.json
- Val: COCO 2017 Val Images + Val Annotations -
instances_val2017.json
- Train: COCO 2017 Train Images + Train Annotations -
- Dataset Download: https://cocodataset.org/#download
- Dataset Format Information: https://cocodataset.org/#format-data
- API to parse COCO: https://github.com/philferriere/cocoapi
- Pre Trained Weights - The weights can directly be downloaded from here: weights.h5 - It is stored using Git LFS.
- Fast Inference - Inference Time for batch of
32
images of512x512
dimensions with an Nvidia RTX 2080Ti is just10.3 µs
.
Images (Left to Right): Input Image
, Predicted Image
, Thresholded Mask @ 0.5
, Ground Truth Mask
The experiment should be fairly reproducible. However, a GPU would be recommended for training. For Inference, a CPU System would suffice.
- CPU: AMD Ryzen 7 3700X - 8 Cores 16 Threads
- GPU: Nvidia GeForce RTX 2080 Ti 11 GB
- RAM: 32 GB DDR4 @ 3200 MHz
- Storage: 1 TB NVMe SSD (This is not important, even a normal SSD would suffice)
- OS: Ubuntu 20.10
Alternative Option: Google Colaboratory - GPU Kernel
- Use the COCO API to extract the masks from the dataset. (Refer: Dataset Preparation.ipynb Notebook)
- Save the masks in a directory as
.jpg
images. - Example Directory Structure:
.
├── images
│ ├── train
│ │ ├── *.jpg
│ └── val
│ └── *.jpg
└── masks
│ ├── train
│ │ ├── *.jpg
│ └── val
│ └── *.jpg
Simple List of Deep Learning Libraries. The main Architecture/Model is developed with Keras, which comes as a part of Tensorflow 2.x
Since this is a Proof of Concept Project, I am not maintaining a CHANGELOG.md at the moment. However, the primary goal is to improve the architecture to make the predicted masks more accurate.
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
- Website: Animikh Aich - Website
- LinkedIn: animikh-aich
- Email: animikhaich@gmail.com
- Twitter: @AichAnimikh