Skip to content

[AAAI-2024] Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception, Xiao Wang, Wentao Wu, Chenglong Li, Zhicheng Zhao, Zhe Chen, Yukai Shi, Jin Tang

Notifications You must be signed in to change notification settings

ykshi/VehicleMAE

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

firstIMG

Official PyTorch implementation of Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception, Xiao Wang, Wentao Wu, Chenglong Li, Zhicheng Zhao, Zhe Chen, Yukai Shi, Jin Tang, AAAI-2024 [arXiv]

Abstract

firstIMG

Our Proposed Framework VehicleMAE

framework

Environment Setting

Dataset Download

data_show

Pre-trained Model Download

Pre-trained Model Vit-base
Pre-trained checkpoint download
Extracted code 6zkx

Training

#If you pre-training VehicleMAE using a single GPU, please run.
CUDA_VISIBLE_DEVICES=0 python main.py
#If you pre-training VehicleMAE using multiple GPUs, please run.
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 main.py

Experimental Results

We used full fine-tuning to test the pre-trained model on four downstream tasks. The results are shown in the table below.

Method

Dataset

VAR

V-Reid

VFR

VPS

mA

Acc

F1

mAP

R1

Acc

mIou

mAcc

Scratch

-

84.67

80.86

84.90

35.3

57.3

24.8

49.36

59.22

MoCov3

Imagenet1K

90.38

93.88

95.33

75.5

94.4

91.3

73.17

78.60

DINO

Imagenet1K

89.92

91.09

93.11

64.3

91.5

-

68.43

73.37

IBOT

Imagenet1K

89.51

90.17

92.37

68.9

92.6

81.1

66.03

71.06

MAE

Imagenet1K

89.69

93.60

95.08

76.7

95.8

91.2

69.54

75.36

MAE

Autobot1M

90.19

94.06

95.43

75.5

95.4

91.3

69.00

75.36

VehicleMAE

Autobot1M

92.21

94.91

96.17

85.6

97.9

94.5

73.29

80.22

The four downstream tasks are vehicle attribute recognition (VAR), vehicle re-identification (V-Reid), vehicle fine-grained recognition (VFR), and vehicle partial segmentation (VPS).

Visual Results

reconst_vis

attentionmaps

Acknowledgement

Citation

If you find this work helps your research, please cite the following paper and give us a star.

@misc{wang2023structural,
      title={Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception}, 
      author={Xiao Wang and Wentao Wu and Chenglong Li and Zhicheng Zhao and Zhe Chen and Yukai Shi and Jin Tang},
      year={2023},
      eprint={2312.09812},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

if you have any problems with this work, please leave an issue.

About

[AAAI-2024] Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception, Xiao Wang, Wentao Wu, Chenglong Li, Zhicheng Zhao, Zhe Chen, Yukai Shi, Jin Tang

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%