This repo is the official implementation for CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax. [Paper] [Supp] [Slides] [Video] [Code and models]
Note: Current code is still not very clean yet. We are still working on it, and it will be updated soon.
The requirements are exactly the same as mmdetection v1.0.rc0. We tested on on the following settings:
- python 3.7
- cuda 9.2
- pytorch 1.3.1+cu92
- torchvision 0.4.2+cu92
- mmcv 0.2.14
HH=`pwd`
conda create -n mmdet python=3.7 -y
conda activate mmdet
pip install cython
pip install numpy
pip install torch
pip install torchvision
pip install pycocotools
pip install mmcv
pip install matplotlib
pip install terminaltables
cd lvis-api/
python setup.py develop
cd $HH
python setup.py develop
# Make sure you are in dir BalancedGroupSoftmax
mkdir data
cd data
mkdir lvis
mkdir pretrained_models
- If you already have COCO2017 dataset, it will be great. Link
train2017
andval2017
folders under folderlvis
. - If you do not have COCO2017 dataset, please download:
COCO train set and
COCO val set
and unzip these files and mv them under folder
lvis
.
- Download lvis annotations: lvis train ann and lvis val ann.
- Unzip all the files and put them under
lvis
,
To train HTC models, download COCO stuff annotations and change the name of folder
stuffthingmaps_trainval2017
tostuffthingmaps
.
Download the corresponding pre-trained models below.
- To train baseline models, we need models trained on COCO to initialize. Please download the corresponding COCO models at mmdetection model zoo.
- To train balanced group softmax models (shorted as
gs
models), we need corresponding baseline models trained on LVIS to initialize and fix all parameters except for the last FC layer. - Move these model files to
./data/pretrained_models/
You can either donwnload or generate them before training and testing. Put them under
./data/lvis/
.
- BAGS models:
label2binlabel.pt, pred_slice_with0.pt, valsplit.pkl
- Re-weight models:
cls_weight.pt, cls_weight_bours.pt
- RFS models:
class_to_imageid_and_inscount.pt
After all these operations, the folder data
should be like this:
data
├── lvis
│ ├── lvis_v0.5_train.json
│ ├── lvis_v0.5_val.json
│ ├── stuffthingmaps (Optional, for HTC models only)
│ ├── label2binlabel.pt (Optional, for GAGS models only)
│ ├── ...... (Other intermidiate files)
│ │ ├── train2017
│ │ │ ├── 000000004134.png
│ │ │ ├── 000000031817.png
│ │ │ ├── ......
│ │ └── val2017
│ │ ├── 000000424162.png
│ │ ├── 000000445999.png
│ │ ├── ......
│ ├── train2017
│ │ ├── 000000100582.jpg
│ │ ├── 000000102411.jpg
│ │ ├── ......
│ └── val2017
│ ├── 000000062808.jpg
│ ├── 000000119038.jpg
│ ├── ......
└── pretrained_models
├── faster_rcnn_r50_fpn_2x_20181010-443129e1.pth
├── ......
Note: Please make sure that you have prepared the pre-trained models and intermediate files and they have been put to the path specified in
${CONIFG_FILE}
.
Use the following commands to train a model.
# Single GPU
python tools/train.py ${CONFIG_FILE}
# Multi GPU distributed training
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
All config files are under ./configs/
.
./configs/bags
: all models for Balanced Group Softmax../configs/baselines
: all baseline models../configs/transferred:
transferred models from long-tail image classification../configs/ablations
: models for ablation study.
For example, to train a BAGS model with Faster R-CNN R50-FPN:
# Single GPU
python tools/train.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py
# Multi GPU distributed training (for 8 gpus)
./tools/dist_train.sh configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py 8
Important: The default learning rate in config files is for 8 GPUs and 2 img/gpu (batch size = 8*2 = 16). According to the Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. (Cited from mmdetection.)
Note: Please make sure that you have prepared the intermediate files and they have been put to the path specified in
${CONIFG_FILE}
.
Use the following commands to test a trained model.
# single gpu test
python tools/test_lvis.py \
${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
# multi-gpu testing
./tools/dist_test_lvis.sh \
${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
$RESULT_FILE
: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.$EVAL_METRICS
: Items to be evaluated on the results.bbox
for bounding box evaluation only.bbox segm
for bounding box and mask evaluation.
For example (assume that you have downloaded the corresponding model file to ./data/downloaded_models
):
- To evaluate the trained BAGS model with Faster R-CNN R50-FPN for object detection:
# single-gpu testing
python tools/test_lvis.py configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py \
./donwloaded_models/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.pth \
--out gs_box_result.pkl --eval bbox
# multi-gpu testing (8 gpus)
./tools/dist_test_lvis.sh configs/bags/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.py \
./donwloaded_models/gs_faster_rcnn_r50_fpn_1x_lvis_with0_bg8.pth 8 \
--out gs_box_result.pkl --eval bbox
- To evaluate the trained BAGS model with Mask R-CNN R50-FPN for instance segmentation:
# single-gpu testing
python tools/test_lvis.py configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py \
./donwloaded_models/gs_mask_rcnn_r50_fpn_1x_lvis.pth \
--out gs_mask_result.pkl --eval bbox segm
# multi-gpu testing (8 gpus)
./tools/dist_test_lvis.sh configs/bags/gs_mask_rcnn_r50_fpn_1x_lvis.py \
./donwloaded_models/gs_mask_rcnn_r50_fpn_1x_lvis.pth 8 \
--out gs_mask_result.pkl --eval bbox segm
The evaluation results will be shown in markdown table format:
| Type | IoU | Area | MaxDets | CatIds | Result |
| :---: | :---: | :---: | :---: | :---: | :---: |
| (AP) | 0.50:0.95 | all | 300 | all | 25.96% |
| (AP) | 0.50 | all | 300 | all | 43.58% |
| (AP) | 0.75 | all | 300 | all | 27.15% |
| (AP) | 0.50:0.95 | s | 300 | all | 20.26% |
| (AP) | 0.50:0.95 | m | 300 | all | 32.81% |
| (AP) | 0.50:0.95 | l | 300 | all | 40.10% |
| (AP) | 0.50:0.95 | all | 300 | r | 17.66% |
| (AP) | 0.50:0.95 | all | 300 | c | 25.75% |
| (AP) | 0.50:0.95 | all | 300 | f | 29.55% |
| (AR) | 0.50:0.95 | all | 300 | all | 34.76% |
| (AR) | 0.50:0.95 | s | 300 | all | 24.77% |
| (AR) | 0.50:0.95 | m | 300 | all | 41.50% |
| (AR) | 0.50:0.95 | l | 300 | all | 51.64% |
Please refer to our paper and supp for more details.
ID | Models | bbox mAP / mask mAP | Train | Test | Config file | Pretrained Model | Train part | Model |
---|---|---|---|---|---|---|---|---|
(1) | Faster R50-FPN | 20.98 | √ | √ | file | COCO R50 | All | Google drive |
(2) | x2 | 21.93 | √ | √ | file | Model (1) | All | Google drive |
(3) | Finetune tail | 22.28 | × | √ | file | Model (1) | All | Google drive |
(4) | RFS | 23.41 | √ | √ | file | COCO R50 | All | Google drive |
(5) | RFS-finetune | 22.66 | √ | √ | file | Model (1) | All | Google drive |
(6) | Re-weight | 23.48 | √ | √ | file | Model (1) | All | Google drive |
(7) | Re-weight-cls | 24.66 | √ | √ | file | Model (1) | Cls | Google drive |
(8) | Focal loss | 11.12 | × | √ | file | Model (1) | All | Google drive |
(9) | Focal loss-cls | 19.29 | × | √ | file | Model (1) | Cls | Google drive |
(10) | NCM-fc | 16.02 | × | × | Model (1) | |||
(11) | NCM-conv | 12.56 | × | × | Model (1) | |||
(12) |
|
11.01 | × | × | Model (1) | Cls | ||
(13) |
|
21.61 | × | × | Model (1) | Cls | ||
(14) | Ours (Faster R50-FPN) | 25.96 | √ | √ | file | Model (1) | Cls | Google drive |
(15) | Faster X101-64x4d | 24.63 | √ | √ | file | COCO x101 | All | Google drive |
(16) | Ours (Faster X101-64x4d) | 27.83 | √ | √ | file | Model (15) | Cls | Google drive |
(17) | Cascade X101-64x4d | 27.16 | √ | √ | file | COCO cascade x101 | All | Google drive |
(18) | Ours (Cascade X101-64x4d) | 32.77 | √ | √ | file | Model (17) | Cls | Google drive |
(19) | Mask R50-FPN | 20.78/20.68 | √ | √ | file | COCO mask r50 | All | Google drive |
(20) | Ours (Mask R50-FPN) | 25.76/26.25 | √ | √ | file | Model (19) | Cls | Google drive |
(21) | HTC X101-64x4d | 31.28/29.28 | √ | √ | file | COCO HTC x101 | All | Google drive |
(22) | Ours (HTC X101-64x4d) | 33.68/31.20 | √ | √ | file | Model (21) | Cls | Google drive |
(23) | HTC X101-64x4d-MS-DCN | 34.61/31.94 | √ | √ | file | COCO HTC x101-ms-dcn | All | Google drive |
(24) | Ours (HTC X101-64x4d-MS-DCN) | 37.71/34.39 | √ | √ | file | Model (23) | Cls | Google drive |
PS: in column
Pretrained Model
, the file ofModel (n)
is the same as theGoogle drive
file in columnModel
in row(n)
.
@inproceedings{li2020overcoming,
title={Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax},
author={Li, Yu and Wang, Tao and Kang, Bingyi and Tang, Sheng and Wang, Chunfeng and Li, Jintao and Feng, Jiashi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10991--11000},
year={2020}
}
This code is largely based on mmdetection v1.0.rc0 and LVIS API.