MQA

This is the official repo for Multimodal Question Answering for Unified Information Extraction

Abstract

Multimodal information extraction (MIE) aims to extract structured information from unstructured multimedia content. Due to the diversity of tasks and settings, most current MIE models are task-specific and data-intensive, which limits their generalization to real-world scenarios with diverse task requirements and limited labeled data. To address these issues, we propose a novel multimodal question answering (MQA) framework to unify three MIE tasks by reformulating them into a unified span extraction and multi-choice QA pipeline. Extensive experiments on six datasets show that: 1) Our MQA framework consistently and significantly improves the performances of various off-the-shelf large multimodal models (LMM) on MIE tasks, compared to vanilla prompting. 2) In the zero-shot setting, MQA outperforms previous state-of-the-art baselines by a large margin. In addition, the effectiveness of our framework can successfully transfer to the few-shot setting, enhancing LMMs on a scale of 10B parameters to be competitive or outperform much larger language models such as ChatGPT and GPT-4. Our MQA framework can serve as a general principle of utilizing LMMs to better solve MIE and potentially other downstream multimodal tasks.

Method

With the vanilla prompting strategy, LMMs directly identify entities. This complicated and error-prone process may lead to inferior results (e.g., one same span can be classified as two different entity types). In contrast, our MQA framework decomposes a MIE task into two cascaded phases: span extraction and multichoice QA. Spans are extracted as candidates for later multi-choice QA. Each candidate span can be classified into pre-defined categories and one additional none-of-the-above option (E) to discard false positives from the span extraction stage.

Usage

Installation

conda create -n mqa python=3.8
conda activate mqa
pip install salesforce-lavis

Data Preparation

Twitter2015 & Twitter2017

The text data follows the conll format. You can download the Twitter2015 data via this link and download the Twitter2017 data via this link. Please place them in data/.

MNRE-V1 & MNRE-V2

For MRE-V1, please download the data from https://github.com/thecharm/MNRE/tree/main/Version-1

And for MRE-V2, please refer to the https://github.com/thecharm/MNRE

MEE

The images and text articles are in m2e2_rawdata, and annotations are in m2e2_annotation.

The final data structural should be ordered as below:

MQA
 |-- data
 |    |-- mner
 |    |    |-- twitter2015  # text data
 |    |    |    |-- train.txt
 |    |    |    |-- valid.txt
 |    |    |    |-- test.txt
 |    |    |-- twitter2015_images       # raw image data
 |    |    |-- twitter2017
 |    |    |-- twitter2017_images
 |    |-- mre
 |    |    |-- mre_v1         
 |    |    |-- mre_v2   
 |    |-- mee
 |    |    |-- annotations       
 |    |    |-- raw_data

Run experiments

Twitter

Twitter2017, take BLIP2-Flan-T5 XL as an example

bash scripts/blip2_flant5xl/ner_span_17.sh
bash scripts/blip2_flant5xl/ner_et_17.sh

Twitter2015, take BLIP2-Flan-T5 XL as an example

bash scripts/blip2_flant5xl/ner_span_15.sh
bash scripts/blip2_flant5xl/ner_et_15.sh

MRE

MNRE-V1, take BLIP2-Flan-T5 XL as an example

bash scripts/blip2_flant5xl/re.sh

MIED

bash scripts/blip2_flant5xl/iee.sh

MTED

bash scripts/blip2_flant5xl/ee_span.sh
bash scripts/blip2_flant5xl/ee_et.sh

Citation

If you find MQA useful for your work, please cite using the following BibTeX:

@article{sun2023multimodal,
  title={Multimodal Question Answering for Unified Information Extraction},
  author={Sun, Yuxuan and Zhang, Kai and Su, Yu},
  journal={arXiv preprint arXiv:2310.03017},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
configs		configs
configs_vanilla		configs_vanilla
dataset		dataset
imgs		imgs
scripts		scripts
scripts_vanilla		scripts_vanilla
tasks		tasks
tasks_vanilla		tasks_vanilla
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MQA

Abstract

Method

Usage

Installation

Data Preparation

Run experiments

MRE

MIED

MTED

Citation

About

Releases

Packages

Contributors 2

Languages

OSU-NLP-Group/MQA

Folders and files

Latest commit

History

Repository files navigation

MQA

Abstract

Method

Usage

Installation

Data Preparation

Run experiments

MRE

MIED

MTED

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages