https://arxiv.org/abs/2209.06794
PaLI: A Jointly-Scaled Multilingual Language-Image Model (Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut)
pathway lm 쪽에서 큰 모델을 하나 더 만들었네요. 3.9B vit + 13B mT5 입니다. 방대한 multilingual에 대해 다양한 task로 프리트레이닝을 했네요. (캡션, ocr 등등)
#vision-language #pretraining