Disclaimer: This is a personal project, where I recoded Efficient PS in a month. Feel free to use my code. Note that the authors also shared their code.
Paper: Efficient PS
Code from the authors: here
To create this code I used multiple frameworks:
- EfficientNet-Pytorch for the backbone
- detectron2 for the instance head (Mask-RCNN)
- In-Place Activated BatchNorm
- COCO 2018 Panoptic Segmentation Task API (Beta version) to compute panoptic quality metric
- Download Cityscape Dataset:
git clone https://github.com/mcordts/cityscapesScripts.git
# City scapes script
pip install git+https://github.com/mcordts/cityscapesScripts.git
# Panoptic
CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesscripts/preparation/createPanopticImgs.py
- Install pytorch
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
- Install Albumentation
pip install -U albumentations
- Install Pytorch lighting
pip install pytorch-lightning
- Install Inplace batchnorm
pip install inplace-abn
- Install EfficientNet Pytorch
pip install efficientnet_pytorch
- Install detecron 2 dependencies
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
- Install Panoptic api
pip install git+https://github.com/cocodataset/panopticapi.git
- Modify
- Run
1 - Original Configuration of the authors
Training config
Solver: SGD
lr: 0.007
momentum: 0.9
Batch_size: 16
Image_size: 1024 x 2048
Norm: SyncInplaceBN
- RandomCrop
- RandomFlip
- Normalize
Warmup: 200 iterations 1/3 lr to lr
Scheduler: StepLR
- step [120, 144]
- total epoch: 160
2 - Adapted configuration to my resources
The authors trained their models using 16 NVIDIA Titan X GPUs. Due to the fact that I only had one GPU to train the model, I could not use the same configuration. Here is a summary of the necessary implementation decisions:
- I first wanted on a batch size of one to keep the same image size. But the
1024 x 2048
image did not fit into memory. So I reduced the size of the images by 2 leading to512 x 1024
images. - Still I could not fit many images into memory, in order to increase the batch size and the speed of the training I decided to use mixed precision training. Mixed precision training is simply combining single precision (32 bit) tensor with half precision (16bit) tensor. Using 16bit tensor frees up a lot of memory and also speeds up the overall training, but it can also reduce the performance of the overall training. (More information in the paper)
- Using a smaller images size and 16 bit precision enabled me to have a
batch size
of 3 images. - For the optimizer, I decided to use
which is more stable and so requires less optimisation to reach good performances, I reduced the learning rate base on the ratio of batch size between their implementation and mine, giving me a learning rate of1.3e-3
. Base on my experiments changing the learning rate did not seem to make a big impact. - Since I was not able to train for the number of epochs used during the training (160 epochs), I decided to use
as a scheduler in order to optimize my performance on a small number of epochs. - For the augmentations:
- I did not use
, mainly because I did not have time to optimize the creation of the batch with different image sizes. In their case they have one image per batch, so the problem does not occur. I could have still used random scale on higher scale in order to perform random cropping but it was not my top priority. RandomFlip
(with the statistics of the dataset) are applied
- I did not use
- On the testing pipeline: I did not do multiscaling for the testing procedure.
To sum up, we have:
Training config
Solver: Adam
lr: 1.3e-3
Batch_size: 3
Image_size: 512 x 1024
Norm: InplaceBN
- RandomFlip
- Normalize
Warmup: 500 iterations lr/500 to lr
Scheduler: ReduceLROnPlateau
- patience 3
- min lr: 1e-4
Best metrics I obtained, with the config given as default:
Epoch 21 | PQ SQ RQ N
All | 45.4 75.4 57.9 19
Things | 34.4 73.3 46.6 8
Stuff | 53.4 76.9 66.0 11