Skip to content

Latest commit

 

History

History
118 lines (74 loc) · 8.76 KB

README.md

File metadata and controls

118 lines (74 loc) · 8.76 KB

Referring Image Matting [CVPR-2023]




This is the official repository of the paper Referring Image Matting.

Jizhizi Li, Jing Zhang, and Dacheng Tao

Introduction | RefMatte | CLIPMat | Results | Statement


🚀 News

[2023-04-17]: The datasets RefMatte and RefMatte-RW100 can now be openly accessed from the links below! Please follow the dataset release agreements to access.

Dataset Dataset Link (One Drive) Size Dataset Release Agreement
RefMatte Link (pw: 3ft9cb) 43.7G Agreement (CC BY-NC License)
RefMatte-RW100 Link (pw: 3ft9cb) 66.6M Agreement (CC BY-NC License)

[2023-02-28]: The paper has been accepted by the Computer Vision and Pattern Recognition Conference (CVPR)! 🎉

Introduction

Image matting refers to extracting the accurate foregrounds in the image. Current automatic methods tend to extract all the salient objects in the image indiscriminately. In this paper, we propose a new task named Referring Image Matting (RIM), referring to extracting the meticulous alpha matte of the specific object that can best match the given natural language description. We then propose a large-scale dataset RefMatte and a carefully designed method CLIPMat to serve as a baseline suite for RIM. We believe the new task RIM along with the RefMatte dataset and the method CLIPMat will open new research directions in this area and facilitate future studies. The dataset, code, and the method will be published soon.

RefMatte and RefMatte-RW100

Prevalent visual grounding methods are all limited to the segmentation level, probably due to the lack of high-quality datasets. To fill the gap, we establish the first large-scale challenging dataset RefMatte by designing a comprehensive image composition and expression generation engine to produce synthetic images on top of current public high-quality matting foregrounds with flexible logics and re-labelled diverse attributes. RefMatte consists of 230 object categories, 47,500 images, 118,749 expression-region entities, and 474,996 expressions, which can be further extended easily in the future. Besides this, we also construct a real-world test set RefMatte-RW100 with manually generated phrase annotations consisting of 100 natural images to further evaluate the generalization of RIM models. We show some examples of RefMatte as follows, including the images, the alpha mattes and the input texts. More can be seen from this page. We have released the dataset RefMatte and RefMatte-RW100, please follow the dataset release agreements to access.

Dataset Dataset Link (One Drive) Size Dataset Release Agreement
RefMatte Link (pw: 3ft9cb) 43.7G Agreement (CC BY-NC License)
RefMatte-RW100 Link (pw: 3ft9cb) 66.6M Agreement (CC BY-NC License)

We also generate the wordcloud of the keywords, attributes and relationships in RefMatte as belows. As can be seen, the dataset has a large portion of human and animals since they are very common in the image matting task. The most frequent attributes in RefMatte are male, gray, transparent, and salient, while the relationship words are more balanced.

CLIPMat

Furthermore, we present a novel baseline method CLIPMat for RIM, including a context-embedded prompt, a text-driven semantic pop-up, and a multi-level details extractor. Extensive experiments on RefMatte in both keyword and expression settings validate the superiority of CLIPMat over representative methods. We show the diagram as follows, while more information can be viewed from the paper.

Results

We show some examples of our test results on RefMatte test set and RefMatte-RW100 by our CLIPMat given text inputs and the images under both keyword-based and expression-based setting. More can be seen from this page.

Statement

If you are interested in our work, please consider citing the following:

@inproceedings{rim,
  title={Referring Image Matting},
  author={Li, Jizhizi and Zhang, Jing and Tao, Dacheng},
  booktitle={Proceedings of the IEEE Computer Vision and Pattern Recognition},
  year={2023}
}

This project is under the CC BY-NC license. For further questions, please contact Jizhizi Li at jili8515@uni.sydney.edu.au.

Relevant Projects

[1] Deep Automatic Natural Image Matting, IJCAI, 2021 | Paper | Github
     Jizhizi Li, Jing Zhang, and Dacheng Tao

[2] Privacy-Preserving Portrait Matting, ACM MM, 2021 | Paper | Github
     Jizhizi Li, Sihan Ma, Jing Zhang, and Dacheng Tao

[3] Bridging Composite and Real: Towards End-to-end Deep Image Matting, IJCV, 2022 | Paper | Github
     Jizhizi Li, Jing Zhang, Stephen J. Maybank, and Dacheng Tao

[4] Rethinking Portrait Matting with Privacy Preserving, IJCV, 2023 | Paper | Github
     Sihan Ma, Jizhizi Li, Jing Zhang, He Zhang, and Dacheng Tao

[5] Deep Image Matting: A Comprehensive Survey, ArXiv, 2023 | Paper | Github
     Jizhizi Li, Jing Zhang, and Dacheng Tao