GitHub - ykshi/VehicleMAE: [AAAI-2024] Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception, Xiao Wang, Wentao Wu, Chenglong Li, Zhicheng Zhao, Zhe Chen, Yukai Shi, Jin Tang

Official PyTorch implementation of Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception, Xiao Wang, Wentao Wu, Chenglong Li, Zhicheng Zhao, Zhe Chen, Yukai Shi, Jin Tang, AAAI-2024 [arXiv]

Abstract

Our Proposed Framework VehicleMAE

Environment Setting

Dataset Download

Pre-trained Model Download

Pre-trained Model	Vit-base
Pre-trained checkpoint	download
Extracted code	6zkx

Training

#If you pre-training VehicleMAE using a single GPU, please run.
CUDA_VISIBLE_DEVICES=0 python main.py
#If you pre-training VehicleMAE using multiple GPUs, please run.
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 main.py

Experimental Results

We used full fine-tuning to test the pre-trained model on four downstream tasks. The results are shown in the table below.

Method	Dataset	VAR			V-Reid		VFR	VPS
Method	Dataset	mA	Acc	F1	mAP	R1	Acc	mIou	mAcc
Scratch	-	84.67	80.86	84.90	35.3	57.3	24.8	49.36	59.22
MoCov3	Imagenet1K	90.38	93.88	95.33	75.5	94.4	91.3	73.17	78.60
DINO	Imagenet1K	89.92	91.09	93.11	64.3	91.5	-	68.43	73.37
IBOT	Imagenet1K	89.51	90.17	92.37	68.9	92.6	81.1	66.03	71.06
MAE	Imagenet1K	89.69	93.60	95.08	76.7	95.8	91.2	69.54	75.36
MAE	Autobot1M	90.19	94.06	95.43	75.5	95.4	91.3	69.00	75.36
VehicleMAE	Autobot1M	92.21	94.91	96.17	85.6	97.9	94.5	73.29	80.22

The four downstream tasks are vehicle attribute recognition (VAR), vehicle re-identification (V-Reid), vehicle fine-grained recognition (VFR), and vehicle partial segmentation (VPS).

Visual Results

Acknowledgement

Citation

If you find this work helps your research, please cite the following paper and give us a star.

@misc{wang2023structural,
      title={Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception}, 
      author={Xiao Wang and Wentao Wu and Chenglong Li and Zhicheng Zhao and Zhe Chen and Yukai Shi and Jin Tang},
      year={2023},
      eprint={2312.09812},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

if you have any problems with this work, please leave an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
__pycache__		__pycache__
figures		figures
models		models
README.md		README.md
main.py		main.py
masking_generator.py		masking_generator.py
misc.py		misc.py
pos_embed.py		pos_embed.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstract

Our Proposed Framework VehicleMAE

Environment Setting

Dataset Download

Pre-trained Model Download

Training

Experimental Results

Visual Results

Acknowledgement

Citation

About

Releases

Packages

Languages

ykshi/VehicleMAE

Folders and files

Latest commit

History

Repository files navigation

Abstract

Our Proposed Framework VehicleMAE

Environment Setting

Dataset Download

Pre-trained Model Download

Training

Experimental Results

Visual Results

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages