VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement

Hanjung Kim, Jaehyun Kang, Miran Heo, Sukjun Hwang, Seoung Wug Oh, Seon Joo Kim

[arXiv] [Project] [BibTeX]

Features

Video Instance Segmentation by leveraging an appearance information.
Support major video instance segmentation datasets: YouTubeVIS 2019/2021/2022, Occluded VIS (OVIS).

Installation

See installation instructions.

Getting Started

For dataset preparation instructions, refer to Preparing Datasets for VISAGE.

We provide a script train_net_video.py, that is made to train all the configs provided in VISAGE.

To train a model with "train_net_video.py", first setup the corresponding datasets following datasets/README.md, then download the COCO pre-trained instance segmentation weights (R50, Swin-L) and put them in the current working directory. Once these are set up, run:

python train_net_video.py --num-gpus 4 \
  --config-file configs/youtubevis_2019/visage_R50_bs16.yaml

To evaluate a model's performance, use

python train_net_video.py \
  --config-file configs/youtubevis_2019/visage_R50_bs16.yaml \
  --eval-only MODEL.WEIGHTS /path/to/checkpoint_file

Model Zoo

YouTube-VIS 2019

Name	Backbone	AP	AP50	AP75	AR1	AR10	Link
VISAGE	ResNet-50	55.1	78.1	60.6	51.0	62.3	model

YouTube-VIS 2021

Name	Backbone	AP	AP50	AP75	AR1	AR10	Link
VISAGE	ResNet-50	51.6	73.8	56.1	43.6	59.3	model

OVIS

Name	Backbone	AP	AP50	AP75	AR1	AR10	Link
VISAGE	ResNet-50	36.2	60.3	35.3	17.0	40.3	model

License

The majority of VISAGE is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), IFC(Apache-2.0 License), Mask2Former(MIT License), Deformable-DETR(Apache-2.0 License), MinVIS(Nvidia Source Code License-NC),and VITA(Apache-2.0 License).

Citing VISAGE

@misc{kim2024visage,
      title={VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement}, 
      author={Hanjung Kim and Jaehyun Kang and Miran Heo and Sukjun Hwang and Seoung Wug Oh and Seon Joo Kim},
      year={2024},
      eprint={2312.04885},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

This repo is largely based on Mask2Former (https://github.com/facebookresearch/Mask2Former) and MinVIS (https://github.com/NVlabs/MinVIS) and VITA (https://github.com/sukjunhwang/VITA).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
datasets		datasets
demo_video		demo_video
mask2former		mask2former
mask2former_video		mask2former_video
visage		visage
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
convert_coco2ytvis.py		convert_coco2ytvis.py
requirements.txt		requirements.txt
train_net_video.py		train_net_video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement

Features

Installation

Getting Started

Model Zoo

YouTube-VIS 2019

YouTube-VIS 2021

OVIS

License

Citing VISAGE

Acknowledgement

About

Releases

Packages

Languages

License

KimHanjung/VISAGE

Folders and files

Latest commit

History

Repository files navigation

VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement

Features

Installation

Getting Started

Model Zoo

YouTube-VIS 2019

YouTube-VIS 2021

OVIS

License

Citing VISAGE

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages