- [2024.1.5] Initial release of the RWD dataset. I will be updating the arXiv after a bug was found, causing some variations to the original numbers.
The proposed Real-World Object Detection (RWD) benchmark consists of five real-world, application-driven datasets:
FOMO is a novel approach in Open World Object Detection (OWOD), harnessing foundation models to detect unknown objects by their shared attributes with known classes. It generates and refines attributes using language models and known class exemplars, enabling effective identification of novel objects. Benchmarked on diverse real-world datasets, FOMO significantly outperforms existing baselines, showcasing the potential of foundation models in complex detection tasks.
We have trained and tested our models on Ubuntu 20.04
, CUDA 12.2
, Python 3.7.16
conda create --name fomo python==3.7.16
conda activate fomo
pip install -r requirements.txt
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
Dataset setup instruction is in DATASET_SETUP.md.
*Note: you may need to give permissions to the .sh files under the 'configs' and 'tools' directories by running
chmod +x *.sh
in each directory. To run the OWOD baselines, use the configurations defined in \configs
:
- run_owd.sh - evaluation of tasks 1-4 on the SOWOD/MOWOD Benchmark.
- run_owd_baseline.sh - evaluation of tasks 1-4 on the SOWOD Benchmark.
To run FOMO:
- run_rwd.sh - evaluation of all datasets on task 1 RWD Benchmark.
- run_rwd_t2.sh - evaluation of all datasets on task 2 RWD Benchmark.
To run baselines: 3. run_rwd_baselines.sh - evaluation of all datasets on task 1 RWD Benchmark. 4. run_rwd_t2_baselines.sh - evaluation of all datasets on task 2 RWD Benchmark. 5. run_rwd_fs_baseline.sh - evaluation of the few-shot baseline on all datasets on task 1 RWD Benchmark. 6. run_rwd_t2_fs_baseline.sh - evaluation of the few-shot baseline on all datasets on task 2 RWD Benchmark.
Note: Please check the Deformable DETR repository for more evaluation details.
If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.
@InProceedings{zohar2023open,
author = {Zohar, Orr and Lozano, Alejandro and Goel, Shelly and Yeung, Serena and Wang, Kuan-Chieh},
title = {Open World Object Detection in the Era of Foundation Models},
booktitle = {arXiv preprint arXiv:2312.05745},
year = {2023},
}
Should you have any questions, please contact 📧 orrzohar@stanford.edu
FOMO builds on other code bases such as:
- PROB - PROB: Probabilistic Objectness for Open World Obejct Detection codebase.
- OWL-ViT - The Transformer's library implementation of OWL-ViT.
If you found FOMO useful please consider citing these works:
@InProceedings{Zohar_2023_CVPR,
author = {Zohar, Orr and Wang, Kuan-Chieh and Yeung, Serena},
title = {PROB: Probabilistic Objectness for Open World Object Detection},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {11444-11453}
}
@article{minderer2022simple,
title = {Simple Open-Vocabulary Object Detection with Vision Transformers},
author = {Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby},
journal = {ECCV},
year = {2022},
}