GitHub - zhuduowang/ChangeViT: The officical code of 'ChangeViT: Unleashing Plain Vision Transformers for Change Detection'.

ChangeViT

Codes and models for ChangeViT: Unleashing Plain Vision Transformers for Change Detection .

Duowang Zhu, Xiaohu Huang, Haiyan Huang, Zhenfeng Shao, Qimin Cheng

Update

[2024/6/24] All the code has been released, including training and inference. 😊
[2024/6/19] The core component of this paper has been released, including detail-capture, and feature injector.
[2024/6/18] The training code will be publicly available at about ~~2024/7/5~~.

Abstract

In this paper, our study uncovers ViTs' unique advantage in discerning large-scale changes, a capability where CNNs fall short. Capitalizing on this insight, we introduce ChangeViT, a framework that adopts a plain ViT backbone to enhance the performance of large-scale changes. This framework is supplemented by a detail-capture module that generates detailed spatial features and a feature injector that efficiently integrates fine-grained spatial information into high-level semantic learning. The feature integration ensures that ChangeViT excels in both detecting large-scale changes and capturing fine-grained details, providing comprehensive change detection across diverse scales. Without bells and whistles, ChangeViT achieves state-of-the-art performance on three popular high-resolution datasets (i.e., LEVIR-CD, WHU-CD, and CLCD) and one low-resolution dataset (i.e., OSCD), which underscores the unleashed potential of plain ViTs for change detection. Furthermore, thorough quantitative and qualitative analyses validate the efficacy of the introduced modules, solidifying the effectiveness of our approach.

Framework

Figure 1. Overview of the proposed $\textbf{ChangeViT}$. bi-temporal images $I_{1}$ and $I_{2}$ are firstly fed into shared ViT to extract high-level semantic features and detail-capture module to extract low-level detailed information. Subsequently, a feature injector is introduced to inject the low-level details into high-level features. Finally, a decoder is utilized to predict changed probability maps.

Performance

Table 1. Performance comparison of different change detection methods on LEVIR-CD, WHU-CD, and CLCD datasets, respectively. The best results are highlighted in bold and the second best results are underlined. All results of the three evaluation metrics are described as percentages (%).

Method	#Params(M)	FLOPs(G)	LEVIR-CD			WHU-CD			CLCD
Method	#Params(M)	FLOPs(G)	F1	IoU	OA	F1	IoU	OA	F1	IoU	OA
DTCDSCN	$41.07$	$20.44$	$87.43$	$77.67$	$98.75$	$79.92$	$66.56$	$98.05$	$57.47$	$40.81$	$94.59$
SNUNet	$12.04$	$54.82$	$88.16$	$78.83$	$98.82$	$83.22$	$71.26$	$98.44$	$60.82$	$43.63$	$94.90$
ChangeFormer	$41.03$	$202.79$	$90.40$	$82.48$	$99.04$	$87.39$	$77.61$	$99.11$	$61.31$	$44.29$	$94.98$
BIT	$\textbf{3.55}$	$\textbf{10.63}$	$89.31$	$80.68$	$98.92$	$83.98$	$72.39$	$98.52$	$59.93$	$42.12$	$94.77$
ICIFNet	$23.82$	$25.36$	$89.96$	$81.75$	$98.99$	$88.32$	$79.24$	$98.96$	$68.66$	$52.27$	$95.77$
DMINet	$\underline{6.24}$	$\underline{14.42}$	$90.71$	$82.99$	$99.07$	$88.69$	$79.68$	$98.97$	$67.24$	$50.65$	$95.21$
GASNet	$23.59$	$23.52$	$90.52$	$83.48$	$99.07$	$91.75$	$84.76$	$99.34$	$63.84$	$46.89$	$94.01$
AMTNet	$24.67$	$21.56$	$90.76$	$83.08$	$98.96$	$92.27$	$85.64$	$99.32$	$75.10$	$60.13$	$96.45$
EATDer	$6.61$	$23.43$	$91.20$	$83.80$	$98.75$	$90.01$	$81.97$	$98.58$	$72.01$	$56.19$	$96.11$
ChangeViT-T (Ours)	$11.68$	$27.15$	$\underline{91.81}$	$\underline{84.86}$	$\underline{99.17}$	$\underline{94.53}$	$\underline{89.63}$	$\underline{99.57}$	$\underline{77.31}$	$\underline{63.01}$	$\underline{96.67}$
ChangeViT-S (Ours)	$32.13$	$38.80$	$\textbf{91.98}$	$\textbf{85.16}$	$\textbf{99.19}$	$\textbf{94.84}$	$\textbf{90.18}$	$\textbf{99.59}$	$\textbf{77.57}$	$\textbf{63.36}$	$\textbf{96.79}$

Table 2. Performance comparison of different change detection methods on the OSCD dataset. The best results are highlighted in bold and the second best results are underlined. All results of the three evaluation metrics are described as percentages (%).

Method	OSCD
Method	F1	IoU	OA
DTCDSCN	$36.13$	$22.05$	$94.50$
SNUNet	$27.02$	$15.62$	$93.81$
ChangeFormer	$38.22$	$23.62$	$94.53$
BIT	$29.58$	$17.36$	$90.15$
ICIFNet	$23.03$	$13.02$	$94.61$
DMINet	$42.23$	$26.76$	$95.00$
GASNet	$10.71$	$5.66$	$91.52$
AMTNet	$10.25$	$5.40$	$94.29$
EATDer	$54.23$	$36.98$	$93.85$
ChangeViT-T (Ours)	$\underline{55.13}$	$\underline{38.06}$	$\underline{95.01}$
ChangeViT-S (Ours)	$\textbf{55.51}$	$\textbf{38.42}$	$\textbf{95.05}$

Usage

Data Preparation

Download the LEVIR-CD, WHU-CD, CLCD, and OSCD datasets. (You can also download the processed WHU-CD dataset from here)
Crop each image in the dataset into 256x256 patches.

Prepare the dataset into the following structure and set its path in the config file.

├─Train
    ├─A          jpg/png
    ├─B          jpg/png
    └─label      jpg/png
├─Val
    ├─A 
    ├─B
    └─label
├─Test
    ├─A
    ├─B
    └─label

Checkpoint

Download the pre-weights ViT-T, and ViT-S, then put them into checkpoints folder.
Pre-trained models will come soon.

Dependency

pip install -r requirements.txt

Training

python main.py --file_root LEVIR --max_steps 80000 --model_type small --batch_size 16 --lr 2e-4 --gpu_id 0

Inference

python eval.py --file_root LEVIR --max_steps 80000 --model_type small --batch_size 16 --lr 2e-4 --gpu_id 0

License

ChangeViT is released under the CC BY-NC-SA 4.0 license.

Acknowledgement

This repository is built upon DINOv2 and A2Net. Thanks for those well-organized codebases.

Citation

@article{zhu2024changevit,
  title={ChangeViT: Unleashing Plain Vision Transformers for Change Detection},
  author={Zhu, Duowang and Huang, Xiaohu and Huang, Haiyan and Shao, Zhenfeng and Cheng, Qimin},
  journal={arXiv preprint arXiv:2406.12847},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
dataset		dataset
figures		figures
model		model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChangeViT

Update

Abstract

Framework

Performance

Usage

Data Preparation

Checkpoint

Dependency

Training

Inference

License

Acknowledgement

Citation

About

Releases

Packages

Contributors 2

Languages

License

zhuduowang/ChangeViT

Folders and files

Latest commit

History

Repository files navigation

ChangeViT

Update

Abstract

Framework

Performance

Usage

Data Preparation

Checkpoint

Dependency

Training

Inference

License

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages