This repository is the official implementation of the CVPR2024 CVinW workshop paper (Spotlight): "What’s in a Name? Beyond Class Indices for Image Recognition" Kai Han, Xiaohu Huang, Yandong Li, Sagar Vaze, Jie Li, and Xuhui Jia
Semantic Category Discovery (SCD): Given a collection of images and a large (essentially unconstrained) vocabulary, assign class names to each image.
SCD is released under the CC BY-NC-SA 4.0 license
.
We conduct experiments in two settings, i.e., unsupervised and partially supervised.
Table 1. Results in the unsupervised setting. We use DINO features for the initial clustering step and report metrics for semantic accuracy (involving class naming, left) and clustering (right). ‘TE’ denotes using the textual enhancement technique.
Method | ImageNet-100 sACC | Soft-sACC | ACC | Stanford Dogs sACC | Soft-sACC | ACC | CUB sACC | Soft-sACC | ACC |
---|---|---|---|---|---|---|---|---|---|
Zero-shot transfer (UB) | 85.0 | 92.0 | 85.1 | 60.4 | 83.2 | 60.8 | 54.1 | 83.2 | 55.8 |
Zero-shot transfer (Baseline) | 22.7 | 57.7 | 73.2 | 51.7 | 77.4 | 47.2 | 20.2 | 77.4 | 34.4 |
Ours (Semantic Naming) | 41.2 | 71.3 | 78.2 | 53.8 | 79.1 | 57.9 | 24.5 | 79.1 | 46.5 |
Ours (Semantic Naming) w/TE | 43.0 | 72.5 | 81.3 | 54.1 | 80.0 | 58.7 | 33.5 | 80.0 | 42.6 |
Table 2. Results in the partially supervised setting. We use GCD features for the initial clustering step and report metrics for semantic accuracy (involving class naming, left) and clustering (right). ‘TE’ denotes using the textual enhancement technique.
Method | ImageNet-100 sACC | Soft-sACC | ACC | Stanford Dogs sACC | Soft-sACC | ACC | CUB sACC | Soft-sACC | ACC |
---|---|---|---|---|---|---|---|---|---|
Zero-shot transfer (UB) | 85.0 | 92.0 | 85.1 | 60.4 | 83.2 | 60.8 | 54.1 | 55.8 | 55.8 |
Zero-shot transfer (Baseline) | 22.7 | 57.7 | 74.1 | 51.7 | 77.4 | 60.8 | 20.2 | 57.7 | 54.0 |
Ours (Semantic Naming) | 54.8 | 77.5 | 78.7 | 53.7 | 79.6 | 62.1 | 35.3 | 79.6 | 52.9 |
Ours (Semantic Naming) w/TE | 55.7 | 76.5 | 80.6 | 55.5 | 80.6 | 58.8 | 35.3 | 80.6 | 42.5 |
To install the dependencies, you can use the the following command:
pip install -r requirements.txt
Besides, you need to get into the local_utils/k_means_constrained
folder, and install the package:
python setup.py install
The used datasets can be donwloaded from the links below:
Dataset | Link |
---|---|
CUB | Link |
Standford Dogs | link |
ImageNet | link |
You also need to download the extracted features, gcd pretrained weights, and zero-shot weights and put them into the respective folders.
- Unsupervised Setting
You can just modify the configurations based on what you needs in the script.
sh script/evaluate_unsupervised.sh
- Partially Supervised Setting
You can just modify the configurations based on what you needs in the script.
sh script/evaluate_unsupervised.sh
@inproceedings{han2024whats,
title={What's in a Name? Beyond Class Indices for Image Recognition},
author={Kai Han and Xiaohu Huang and Yandong Li and Vaze Sagar and Jie Li and Xuhui Jia},
booktitle={CVPR Workshops},
year={2024}
}