We introduce Manually vAlidated Vq2a Examples fRom Image/Caption datasetS (MAVERICS), a suite of test-only visual question answering datasets. The datasets are created from image captions by Visual Question Generation with Question Answering validation, or VQ^2A (see the figure below), followed by manual verification. Check our paper for further details.
COCO minival2014 (193KB), generated from COCO Captions.
CC3M dev (177KB), generated from Conceptual Captions.
Format (.json)
dataset str: dataset name split str: dataset split annotations List of image-question-answers triplets, each of which is -- image_id str: image ID -- caption str: image caption -- qa_pairs List of question-answer pairs, each of which is ---- question_id str: question ID ---- raw_question str: raw question ---- question str: processed question ---- answers List of str: 10 ground-truth answers
If you use this dataset in your research, please cite the original image caption datasets and our paper:
Soravit Changpinyo*, Doron Kukliansky*, Idan Szpektor, Xi Chen, Nan Ding, Radu Soricut. All You May Need for VQA are Image Captions. NAACL 2022.
@inproceedings{changpinyo2022vq2a, title = {All You May Need for VQA are Image Captions}, author = {Changpinyo, Soravit and Kukliansky, Doron and Szpektor, Idan and Chen, Xi and Ding, Nan and Soricut, Radu}, booktitle = {NAACL}, year = {2022}, }
A multilingual extension of this approach and its accompanied dataset MaXM can be found on this page.
Please create an issue in this repository. If you would like to share feedback or report concerns, please email schangpi@google.com.