This repository contains the corresponding training code for the project.
We address the problem of Visual Question Answering, which requires both image and language understanding to answer a question about a given photograph. We describe 2 models for this task: a simple bag-of-words baseline and an improved Long Short Term Memory-based approach.
- Download the data and place it in the data folder.
- Check the available parser options.
- Train the networks using the provided file:
python application.py --model_type bow|lstm