Open World Object Detection in the Era of Foundation Models

arXiv website

Orr Zohar, Alejandro Lozano, Shelly Goel, Serena Yeung, Jackson Wang

If you like our project, please give us a star ⭐ on GitHub for latest updates!

📰 News

[2024.1.5] Initial release of the RWD dataset. I will be updating the arXiv after a bug was found, causing some variations to the original numbers.

🔥 Highlights

The proposed Real-World Object Detection (RWD) benchmark consists of five real-world, application-driven datasets:

FOMO is a novel approach in Open World Object Detection (OWOD), harnessing foundation models to detect unknown objects by their shared attributes with known classes. It generates and refines attributes using language models and known class exemplars, enabling effective identification of novel objects. Benchmarked on diverse real-world datasets, FOMO significantly outperforms existing baselines, showcasing the potential of foundation models in complex detection tasks.

🛠️ Requirements and Installation

We have trained and tested our models on Ubuntu 20.04, CUDA 12.2, Python 3.7.16

conda create --name fomo python==3.7.16
conda activate fomo
pip install -r requirements.txt
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia

📸 Dataset Setup

Dataset setup instruction is in DATASET_SETUP.md.

🗝️ Training and Evaluation

*Note: you may need to give permissions to the .sh files under the 'configs' and 'tools' directories by running chmod +x *.sh in each directory. To run the OWOD baselines, use the configurations defined in \configs:

OWOD

run_owd.sh - evaluation of tasks 1-4 on the SOWOD/MOWOD Benchmark.
run_owd_baseline.sh - evaluation of tasks 1-4 on the SOWOD Benchmark.

RWD

To run FOMO:

run_rwd.sh - evaluation of all datasets on task 1 RWD Benchmark.
run_rwd_t2.sh - evaluation of all datasets on task 2 RWD Benchmark.

To run baselines: 3. run_rwd_baselines.sh - evaluation of all datasets on task 1 RWD Benchmark. 4. run_rwd_t2_baselines.sh - evaluation of all datasets on task 2 RWD Benchmark. 5. run_rwd_fs_baseline.sh - evaluation of the few-shot baseline on all datasets on task 1 RWD Benchmark. 6. run_rwd_t2_fs_baseline.sh - evaluation of the few-shot baseline on all datasets on task 2 RWD Benchmark.

Note: Please check the Deformable DETR repository for more evaluation details.

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.

@InProceedings{zohar2023open,
    author    = {Zohar, Orr and Lozano, Alejandro and Goel, Shelly and Yeung, Serena and Wang, Kuan-Chieh},
    title     = {Open World Object Detection in the Era of Foundation Models},
    booktitle = {arXiv preprint arXiv:2312.05745},
    year      = {2023},
}

📧 Contact

Should you have any questions, please contact 📧 orrzohar@stanford.edu

👍 Acknowledgements

FOMO builds on other code bases such as:

PROB - PROB: Probabilistic Objectness for Open World Obejct Detection codebase.
OWL-ViT - The Transformer's library implementation of OWL-ViT.

If you found FOMO useful please consider citing these works:

@InProceedings{Zohar_2023_CVPR,
    author    = {Zohar, Orr and Wang, Kuan-Chieh and Yeung, Serena},
    title     = {PROB: Probabilistic Objectness for Open World Object Detection},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {11444-11453}
}

@article{minderer2022simple,
    title   = {Simple Open-Vocabulary Object Detection with Vision Transformers},
    author  = {Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby},
    journal = {ECCV},
    year    = {2022},
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
data		data
datasets		datasets
docs		docs
models		models
slurm_logs		slurm_logs
util		util
DATASET_SETUP.md		DATASET_SETUP.md
LICENSE		LICENSE
README.md		README.md
engine.py		engine.py
gen_attributes.ipynb		gen_attributes.ipynb
gen_figures.ipynb		gen_figures.ipynb
main.py		main.py
main_figures.ipynb		main_figures.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open World Object Detection in the Era of Foundation Models

Orr Zohar, Alejandro Lozano, Shelly Goel, Serena Yeung, Jackson Wang

If you like our project, please give us a star ⭐ on GitHub for latest updates!

📰 News

🔥 Highlights

🛠️ Requirements and Installation

📸 Dataset Setup

🗝️ Training and Evaluation

OWOD

RWD

✏️ Citation

📧 Contact

👍 Acknowledgements

✨ Star History

About

Releases

Packages

Languages

License

orrzohar/FOMO

Folders and files

Latest commit

History

Repository files navigation

Open World Object Detection in the Era of Foundation Models

Orr Zohar, Alejandro Lozano, Shelly Goel, Serena Yeung, Jackson Wang

If you like our project, please give us a star ⭐ on GitHub for latest updates!

📰 News

🔥 Highlights

🛠️ Requirements and Installation

📸 Dataset Setup

🗝️ Training and Evaluation

OWOD

RWD

✏️ Citation

📧 Contact

👍 Acknowledgements

✨ Star History

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages