Skip to content

Interactive Multimodal Explanations for Easy Visual Question Answering

License

Notifications You must be signed in to change notification settings

anaezquerro/imx-evqa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Interactive Multimodal eXplanations for Easy Visual Question Answering (IMX-EVQA)

Hi👋! This is a Pytorch implementation of the explainability system proposed in A Study on Multimodal and Interactive Explanations for Visual Question Answering (Alipour et al., 2020). We adapted their experiments to obtain interactive multimodal explanations based on text and image in the Easy VQA dataset.

Our adaptation is shown in below. We used a Conditioned U-Net to leverage visual with textual information from the question words. The output is a mask that is applied to the original input image to feed a convolutional classifier with text conditioning.

Installation: Check the libraries requirements.txt to train and deploy our system.

pip3 install -r requirements.txt

Loading data

Before training or predicting with our system, it is required to download the Easy VQA dataset. The folder easy_vqa/data must be downloaded at the root of this repository and renamed as easy-vqa. Optionally, run the following terminal commands:

!wget https://github.com/vzhou842/easy-VQA/archive/refs/heads/master.zip
!unzip master.zip
!mv easy-VQA-master/easy_vqa/data/ easy-vqa
!rm -r easy-VQA-master/
!rm master.zip

The folder structure should look like this:

easy-vqa/  
    test/
    train/
    answers.txt
modules/
utils/
model.py
system.py

Training

Before deploying the system it is required to train the neural network. We suggest to directly run the system.py script to train the model (it is possible to configure some parameters such as the batch size or number of epochs). With the default configuration the model should reach ~90% in F-score. At the end of the training, the system parameters should be stored at results/ folder.

Deployment

See the prepared demo.ipynb for documentated examples of how to work with interactive multimodal explanations.

About

Interactive Multimodal Explanations for Easy Visual Question Answering

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published