Skip to content

Official repository for the paper "Instance-Wise Holistic Order Prediction in Natural Scenes".

License

Notifications You must be signed in to change notification settings

SNU-VGILab/InstaOrder

Repository files navigation

Instance-Wise Holistic Order Prediction In Natural Scenes

Pierre Musacchio1, Hyunmin Lee2, Jaesik Park1

1Seoul National University, 2LG AI Research

This work is an extension of the paper "Instance-Wise Occlusion and Depth Order in Natural Scenes" by Hyumin Lee and Jaesik Park, 2022 (CVPR).

InstaFormer<sup>o,d</sup> Qualitative Results Qualitative results obtained by InstaFormero,d-L200

Overview

This repository provides downloads for:

  1. The InstaOrder dataset. ✅
  2. The InstaOrder Panoptic dataset. ✅
  3. Weights for the InstaOrderNet model family. ✅
  4. Weights for the InstaFormer model family. ✅

We also explain how to run the code for training and evaluation for the InstaFormer model family.

Datasets

InstaOrder

The InstaOrder dataset is an extension of the COCO dataset. Carefully annotated for occlusion and depth order prediction, it contains 2.9M annotations on 101K natural scenes. [Click here for download]

InstaOrder Panoptic

The InstaOrder Panoptic dataset is an extension of the COCO panoptic dataset. It contains things annotations for occlusion and depth order prediction. It contains 2.9M annotations on 101K natural scenes. [Click here for download]

The InstaOrderNet Model Family

Note: we plan on making this repository also run plain InstaOrderNets, but this has yet to be implemented. For running those networks, we refer you to our former InstaOrder repository.

The InstaOrderNet family is capable of pairwise occlusion and depth order prediction given an input image alongside two instance masks. This family comes in three flavors: 'o', 'd' and "o,d", respectively for "occlusion" exclusively, "depth" exclusively and joint "occlusion, depth".

Backbone Config Recall Precision F1 WHDR (distinct) WHDR (overlap) WHDR (all) Weights
InstaOrderNeto 89.39 79.83 80.65 -- -- -- model
InstaOrderNetd -- -- -- 12.95 25.96 17.51 model
InstaOrderNeto,d 82.37 88.67 81.86 11.51 25.22 15.99 model

The InstaFormer Model Family

The InstaFormer family is capable of end-to-end holistic occlusion and depth order prediction. This family comes in three flavors: 'o', 'd' and "o,d", respectively for "occlusion" exclusively, "depth" exclusively and joint "occlusion, depth". In all cases, the model also outputs the scene segmentation.

For clarity, we only report results for occlusion and depth order prediction. Please, refer to the paper for the segmentation results.

InstaFormero

This model flavor exclusively predicts occlusion orders.

Backbone Config Recall Precision F1 Weights
SWIN-T100 89.06 75.69 79.63 model
SWIN-S100 88.91 77.31 80.53 model
SWIN-B100 89.02 76.95 80.64 model
SWIN-B100 89.53 77.34 80.99 model
SWIN-L200 89.82 78.10 81.89 model

InstaFormerd

This model flavor exclusively predicts depth orders.

Backbone config WHDR (distinct) WHDR (overlap) WHDR (all) Weights
SWIN-T100 8.10 25.43 13.75 model
SWIN-S100 8.44 26.04 14.48 model
SWIN-B100 8.28 25.05 13.88 model
SWIN-B100 8.15 25.19 13.72 model
SWIN-L200 8.47 24.91 13.73 model

InstaFormero,d

This model flavor jointly predicts occlusion and depth orders.

Backbone Config Recall Precision F1 WHDR (distinct) WHDR (overlap) WHDR (all) Weights
SWIN-T100 88.64 75.56 79.74 8.43 25.36 14.03 model
SWIN-S100 88.20 75.98 79.57 8.54 25.42 13.96 model
SWIN-B100 88.47 75.96 79.72 8.84 25.77 14.39 model
SWIN-B100 89.24 76.66 80.34 8.15 25.79 14.06 model
SWIN-L200 89.57 78.07 81.37 7.90 24.68 13.30 model

Running InstaFormer

Environment setup

This code has been developed under NVCC 11.7, python 3.8.18, pytorch 2.1.0, torchvision 0.16.0 and detectron2 0.6 (built from source in commit 80307d2 due to import issues).

We heavily recommend to build the code in a docker container and a conda environment.

First, install the apt-get dependencies:

apt-get update && apt-get upgrade -y

# ninja
apt-get install build-ninja -y
# opencv dependencies
apt-get install libgl1-mesa-glx libglib2.0-0 -y

Then, create a conda environment and activate it:

conda create -n instaorder python=3.8 -y
conda activate instaorder

Finally, run the quick_install.sh file:

. ./quick_install.sh

Dataset Preparation

First, prepare the COCO dataset files in the structure explained in this tutorial. Do not forget to set your environment variable $DETECTRON2_DATASETS to the proper directory.

Then, simply place the InstaOrder Panoptic json file downloaded in the previous section in the annotations directory.

Training

First, download a pre-trained Mask2Former panoptic model from the Mask2Former model Zoo, then run the following command:

python train_net.py \
--num-gpus <gpus> \
--config-file <path/to/instaformer/cfg.yaml> \
MODEL.WEIGHTS <path/to/m2f/weights.pkl> \
SOLVER.IMS_PER_BATCH <batch>

Where:

  • <gpus> is the number of GPUs for training,
  • <path/to/instaformer/cfg.yaml> is a yaml file of the model's config (located in configs/instaorder/),
  • <path/to/m2f/weights.pkl> is a .pkl file containing the weights of the Mask2Former model of your choice,
  • <batch> is the batch size for the training.

Evaluation on pre-trained models

Evaluation on a trained InstaFormer model can be run using this command:

python train_net.py \
--eval-only \
--num-gpus <gpus> \
--config-file <path/to/instaformer/cfg.yaml> \
MODEL.WEIGHTS <path/to/instaformer/weights.pth> \

Inference on custom images

Inference on custom images can be run using the following command:

python demo/demo.py \
--config-file <path/to/instaformer/cfg.yaml> \
--input <path/to/image.jpg>
--output <path/to/out/dir>
MODEL.WEIGHTS <path/to/instaformer/weights.pth> \
TEST.OCCLUSION_EVALUATION False \
TEST.DEPTH_EVALUATION False

Where:

  • <path/to/instaformer/cfg.yaml> is a yaml file of the model's config (located in configs/instaorder/),
  • <path/to/image.jpg> is the input image file path,
  • <path/to/out/dir> is the folder path where the output will be stored,
  • <path/to/instaformer/weights.pkl> is a .pth file containing the weights of the Mask2Former model of your choice,
  • <batch> is the batch size for the training.

Since the configuration is made for training and evaluation, you have to manually set TEST.OCCLUSION_EVALUATION and TEST.DEPTH_EVALUATION to False.

Citation

We do not have a citation for our most recent work, however, if you found our work useful, please consider citing our former work:

@inproceedings{lee2022instaorder,
  title={{Instance-wise Occlusion and Depth Orders in Natural Scenes}},
  author={Hyunmin Lee and Jaesik Park},
  booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

Acknowledgments

Our code is based on Mask2Former's official repository. We thank the authors for the open-sourcing their code with the community.

About

Official repository for the paper "Instance-Wise Holistic Order Prediction in Natural Scenes".

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published