Skip to content

Latest commit

 

History

History
113 lines (80 loc) · 4.8 KB

README.en.md

File metadata and controls

113 lines (80 loc) · 4.8 KB

Bilinear attention networks for KVQA

Python 3.6 PyTorch 1.1.0 cuDNN 7.5

This repository is an implementation of Bilinear Attention Networks for the visual question answering task using the KVQA dataset.

Examples of KVQA Overview of bilinear attention networks

The validation scores repeated 5 times are shown as follows:

Embedding Dimension All Yes/No Number Other Unanswerable
Word2vec 200 29.75 ± 0.28 72.59 16.94 17.16 78.74
GloVe 100 30.93 ± 0.19 71.91 17.65 18.93 78.26
fastText 200 30.94 ± 0.09 72.48 17.74 18.96 77.92
BERT 768 30.56 ± 0.12 69.28 17.48 18.65 78.28

This repository is based on and inspired by @hengyuan-hu's work. We sincerely thank for their sharing of the codes.

Prerequisites

You may need a machine with a Titan-grade GPU, 64 GB memory, and PyTorch v1.1.0 for Python3. We highly recommend you to use this docker image.

pip install -r requirements.txt

Install mecab

sudo apt-get install default-jre curl
bash <(curl -s https://raw.githubusercontent.com/konlpy/konlpy/master/scripts/mecab.sh)

KVQA Dataset

You can download the KVQA dataset via this link. Please be aware that this is licensed via Korean VQA License.

Preprocessing

Our implementation uses the pretrained image features by bottom-up-attention, the adaptive 10-100 features for the detected objects in an image. In addition to this, the pretrained Korean word vectors, Word2vec, GloVe, fastText and BERT.

For simplicity, you can prepare the KVQA data as follows and use the below script to avoid a hassle:

  1. Place the downloaded files from KVQA Dataset as follows:
data
├── KVQA_annotations_train.json
├── KVQA_annotations_val.json
├── KVQA_annotations_test.json
└── features
    ├── KVQA_resnet101_faster_rcnn_genome.tsv
    └── VizWiz_resnet101_faster_rcnn_genome.tsv

Notice that if you download the preprocessed features (the tsv files), you don't need to download image sources.

  1. Run the two scripts, download.sh and process.sh.
./tools/download.sh
./tools/process.sh

Training

Run

python3 main.py

to start training. The training and validation scores will be printed at every epoch, and the best model will be saved under the directory saved_models.

You can train a model based on the other question embedding by running as follows:

python3 main.py --q_emb glove-rg

Citation

If you use this code as part of any published research, please consider to cite the following papers:

@inproceedings{Kim_Lim2019,
author = {Kim, Jin-hwa and Lim, Soohyun and Park, Jaesun and Cho, Hansu},
booktitle = {AI for Social Good workshop at NeurIPS},
title = {{Korean Localization of Visual Question Answering for Blind People}},
year = {2019}
}
@inproceedings{Kim2018,
author = {Kim, Jin-Hwa and Jun, Jaehyun and Zhang, Byoung-Tak},
booktitle = {Advances in Neural Information Processing Systems 31},
title = {{Bilinear Attention Networks}},
pages = {1571--1581},
year = {2018}
}

License

  • Korean VQA License for the KVQA Dataset
  • Creative Commons License Deed (CC BY 4.0) for the VizWiz subset
  • GNU GPL v3.0 for the Code

Acknowledgments

We sincerely thank the collaborators from TestWorks for helping the collection.