Codes and models for ChangeViT: Unleashing Plain Vision Transformers for Change Detection .
Duowang Zhu, Xiaohu Huang, Haiyan Huang, Zhenfeng Shao, Qimin Cheng
- [2024/6/24] All the code has been released, including training and inference. 😊
- [2024/6/19] The core component of this paper has been released, including detail-capture, and feature injector.
- [2024/6/18] The training code will be publicly available at about
2024/7/5.
In this paper, our study uncovers ViTs' unique advantage in discerning large-scale changes, a capability where CNNs fall short. Capitalizing on this insight, we introduce ChangeViT, a framework that adopts a plain ViT backbone to enhance the performance of large-scale changes. This framework is supplemented by a detail-capture module that generates detailed spatial features and a feature injector that efficiently integrates fine-grained spatial information into high-level semantic learning. The feature integration ensures that ChangeViT excels in both detecting large-scale changes and capturing fine-grained details, providing comprehensive change detection across diverse scales. Without bells and whistles, ChangeViT achieves state-of-the-art performance on three popular high-resolution datasets (i.e., LEVIR-CD, WHU-CD, and CLCD) and one low-resolution dataset (i.e., OSCD), which underscores the unleashed potential of plain ViTs for change detection. Furthermore, thorough quantitative and qualitative analyses validate the efficacy of the introduced modules, solidifying the effectiveness of our approach.
Figure 1. Overview of the proposed
Method | #Params(M) | FLOPs(G) | LEVIR-CD | WHU-CD | CLCD | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
F1 | IoU | OA | F1 | IoU | OA | F1 | IoU | OA | |||
DTCDSCN | |||||||||||
SNUNet | |||||||||||
ChangeFormer | |||||||||||
BIT | |||||||||||
ICIFNet | |||||||||||
DMINet | |||||||||||
GASNet | |||||||||||
AMTNet | |||||||||||
EATDer | |||||||||||
ChangeViT-T (Ours) | |||||||||||
ChangeViT-S (Ours) |
Method | OSCD | ||
---|---|---|---|
F1 | IoU | OA | |
DTCDSCN | |||
SNUNet | |||
ChangeFormer | |||
BIT | |||
ICIFNet | |||
DMINet | |||
GASNet | |||
AMTNet | |||
EATDer | |||
ChangeViT-T (Ours) | |||
ChangeViT-S (Ours) |
-
Download the LEVIR-CD, WHU-CD, CLCD, and OSCD datasets. (You can also download the processed WHU-CD dataset from here)
-
Crop each image in the dataset into 256x256 patches.
-
Prepare the dataset into the following structure and set its path in the config file.
├─Train ├─A jpg/png ├─B jpg/png └─label jpg/png ├─Val ├─A ├─B └─label ├─Test ├─A ├─B └─label
-
Download the pre-weights ViT-T, and ViT-S, then put them into checkpoints folder.
-
Pre-trained models will come soon.
pip install -r requirements.txt
python main.py --file_root LEVIR --max_steps 80000 --model_type small --batch_size 16 --lr 2e-4 --gpu_id 0
python eval.py --file_root LEVIR --max_steps 80000 --model_type small --batch_size 16 --lr 2e-4 --gpu_id 0
ChangeViT is released under the CC BY-NC-SA 4.0 license.
This repository is built upon DINOv2 and A2Net. Thanks for those well-organized codebases.
@article{zhu2024changevit,
title={ChangeViT: Unleashing Plain Vision Transformers for Change Detection},
author={Zhu, Duowang and Huang, Xiaohu and Huang, Haiyan and Shao, Zhenfeng and Cheng, Qimin},
journal={arXiv preprint arXiv:2406.12847},
year={2024}
}