This is the official implementation (PyTorch and PaddlePaddle) of the paper "DAC-DETR: Divide the Attention Layers and Conquer".
Authors: Zhengdong Hu, Yifan Sun, Jingdong Wang, Yi Yang
📧 📧 📧 Contact: huzhengdongcs@gmail.com
[Sep. 22 2023] DAC-DETR: Divide the Attention Layers and Conquer, has been accepted at NeurIPS 2023 as a poster.
This paper reveals a characteristic of DEtection Transformer (DETR) that negatively impacts its training efficacy, i.e., the cross-attention and self-attention layers in DETR decoder have contrary impacts on the object queries (though both impacts are important). Specifically, we observe the cross-attention tends to gather multiple queries around the same object, while the self-attention disperses these queries far away. To improve the training efficacy, we propose a Divide-And-Conquer DETR (DAC-DETR) that divides the cross-attention out from this contrary for better conquering. During training, DAC-DETR employs an auxiliary decoder that focuses on learning the cross-attention layers. The auxiliary decoder, while sharing all the other parameters, has NO self-attention layers and employs one-to-many label assignment to improve the gathering effect. Experiments show that DAC-DETR brings remarkable improvement over popular DETRs. For example, under the 12 epochs training scheme on MS-COCO, DAC-DETR improves Deformable DETR (ResNet-50) by +3.4 AP and achieves 50.9 (ResNet-50) / 58.1 AP (Swin-Large) based on some popular methods (i.e., DINO and an IoU-related loss).
We count the averaged number of queries that have large affinity with each object. Compared with the baseline, DAC-DETR 1) has more queries for each object, and 2) improves the quality of the closest queries.y-axis denotes “avg number of queries / object".
We use python=3.7.10, pytorch=1.8.0, cuda=11.1.
Clone the repo
git https://github.com/huzhengdongcs/DAC-DETR.git
cd DAC-DETR
Prepare environments
sh env_run.sh
mkdir ./data/
data/
└── coco/
├── train2017/
├── val2017/
└── annotations/
mkdir ./initmodel
You can download Resnet50 and Swin_transformer and put them into ./initmodel
Please note that our implementations are based on 8 A100 or 8 V100 GPUS.
For example, you can run dac_cdn_ice with 12 epochs, Res50 by
sh train.sh
The trained models are saved in output.
For example, you can test dac_cdn_ice with 12 epochs, Res50 by
sh test.sh
Name | Backbone | epochs | AP | Model | log |
---|---|---|---|---|---|
dac_cdn | Res50 | 12 | 50.0 | Google, Baidu | Google, Baidu |
dac_cdn | Res50 | 24 | 51.2 | Google, Baidu | Google, Baidu |
dac_cdn_ice | Res50 | 12 | 50.9 | Google, Baidu | Google, Baidu |
dac_cdn_ice | Res50 | 24 | 52.1 | Google, Baidu | Google, Baidu |
dac_cdn | Swin_Large | 12 | 57.3 | Google, Baidu | Google, Baidu |
dac_cdn_ice | Swin_Large | 12 | 58.1 | Google, Baidu | Google, Baidu |
dac_cdn_ice | Swin_Large | 24 | 59.3 | Google, Baidu | Google, Baidu |
You can access the pytorch code of 'dac-detr + contrastive denoising (cdn)' and model from
If you find DAC-DETR useful to your research, please consider citing:
@inproceedings{
hu2023dacdetr,
title={{DAC}-{DETR}: Divide the Attention Layers and Conquer},
author={Zhengdong Hu and Yifan Sun and Jingdong Wang and Yi Yang},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=8JMexYVcXB}
}