This repository is an official implementation of the paper Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.
By Weijie Su, Xizhou Zhu, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie Zhou, Jifeng Dai.
Code will be available.
Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training), initially described in arxiv, is a simple yet effective one-stage pre-training paradigm. It can integrate existing pre-training methods (supervised pre-training, weakly-supervised pre-training and self-supervised pre-training) under an unified mutual information perspective and maintain all desired properties through a single-stage pre-training. Notably, we successfully pre-train a 1B model (InternImage-H) with M3I Pre-training and achieve new record 65.4 mAP
on COCO detection test-dev, 62.5 mAP
on LVIS detection minival, and 62.9 mIoU
on ADE20k.
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{su2022towards,
title={Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information},
author={Su, Weijie and Zhu, Xizhou and Tao, Chenxin and Lu, Lewei and Li, Bin and Huang, Gao and Qiao, Yu and Wang, Xiaogang and Zhou, Jie and Dai, Jifeng},
journal={arXiv preprint arXiv:2211.09807},
year={2022}
}