Bilinear attention networks for KVQA

This repository is an implementation of Bilinear Attention Networks for the visual question answering task using the KVQA dataset.

The validation scores repeated 5 times are shown as follows:

Embedding	Dimension	All	Yes/No	Number	Other	Unanswerable
Word2vec	200	29.75 ± 0.28	72.59	16.94	17.16	78.74
GloVe	100	30.93 ± 0.19	71.91	17.65	18.93	78.26
fastText	200	30.94 ± 0.09	72.48	17.74	18.96	77.92
BERT	768	30.56 ± 0.12	69.28	17.48	18.65	78.28

This repository is based on and inspired by @hengyuan-hu's work. We sincerely thank for their sharing of the codes.

Prerequisites

You may need a machine with a Titan-grade GPU, 64 GB memory, and PyTorch v1.1.0 for Python3. We highly recommend you to use this docker image.

pip install -r requirements.txt

Install mecab

sudo apt-get install default-jre curl
bash <(curl -s https://raw.githubusercontent.com/konlpy/konlpy/master/scripts/mecab.sh)

KVQA Dataset

You can download the KVQA dataset via this link. Please be aware that this is licensed via Korean VQA License.

Preprocessing

Our implementation uses the pretrained image features by bottom-up-attention, the adaptive 10-100 features for the detected objects in an image. In addition to this, the pretrained Korean word vectors, Word2vec, GloVe, fastText and BERT.

For simplicity, you can prepare the KVQA data as follows and use the below script to avoid a hassle:

Place the downloaded files from KVQA Dataset as follows:

data
├── KVQA_annotations_train.json
├── KVQA_annotations_val.json
├── KVQA_annotations_test.json
└── features
    ├── KVQA_resnet101_faster_rcnn_genome.tsv
    └── VizWiz_resnet101_faster_rcnn_genome.tsv

Notice that if you download the preprocessed features (the tsv files), you don't need to download image sources.

Run the two scripts, download.sh and process.sh.

./tools/download.sh
./tools/process.sh

Training

Run

python3 main.py

to start training. The training and validation scores will be printed at every epoch, and the best model will be saved under the directory saved_models.

You can train a model based on the other question embedding by running as follows:

python3 main.py --q_emb glove-rg

Citation

If you use this code as part of any published research, please consider to cite the following papers:

@inproceedings{Kim_Lim2019,
author = {Kim, Jin-hwa and Lim, Soohyun and Park, Jaesun and Cho, Hansu},
booktitle = {AI for Social Good workshop at NeurIPS},
title = {{Korean Localization of Visual Question Answering for Blind People}},
year = {2019}
}
@inproceedings{Kim2018,
author = {Kim, Jin-Hwa and Jun, Jaehyun and Zhang, Byoung-Tak},
booktitle = {Advances in Neural Information Processing Systems 31},
title = {{Bilinear Attention Networks}},
pages = {1571--1581},
year = {2018}
}

License

Korean VQA License for the KVQA Dataset
Creative Commons License Deed (CC BY 4.0) for the VizWiz subset
GNU GPL v3.0 for the Code

Acknowledgments

We sincerely thank the collaborators from TestWorks for helping the collection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.en.md

README.en.md

Bilinear attention networks for KVQA

Prerequisites

KVQA Dataset

Preprocessing

For simplicity, you can prepare the KVQA data as follows and use the below script to avoid a hassle:

Training

Citation

License

Acknowledgments

Files

README.en.md

Latest commit

History

README.en.md

File metadata and controls

Bilinear attention networks for KVQA

Prerequisites

KVQA Dataset

Preprocessing

For simplicity, you can prepare the KVQA data as follows and use the below script to avoid a hassle:

Training

Citation

License

Acknowledgments