Skip to content

mhashas/Exploring-model-architectures-for-Visual-Question-Answering

Repository files navigation

Exploring model architectures for Visual Question Answering

This repository contains the corresponding training code for the project.

We address the problem of Visual Question Answering, which requires both image and language understanding to answer a question about a given photograph. We describe 2 models for this task: a simple bag-of-words baseline and an improved Long Short Term Memory-based approach.

Bag of Words model

LSTM model

Training

  1. Download the data and place it in the data folder.
  2. Check the available parser options.
  3. Train the networks using the provided file: python application.py --model_type bow|lstm

Results

Both models are correct

Both models are wrong

Only BoW is correct

Only LSTM is correct

About

Natural Language Processing 2017/2018 - MSc Artificial Intelligence @ UvA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages