Exploring model architectures for Visual Question Answering

This repository contains the corresponding training code for the project.

We address the problem of Visual Question Answering, which requires both image and language understanding to answer a question about a given photograph. We describe 2 models for this task: a simple bag-of-words baseline and an improved Long Short Term Memory-based approach.

Bag of Words model

LSTM model

Training

Download the data and place it in the data folder.
Check the available parser options.
Train the networks using the provided file: python application.py --model_type bow|lstm

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
data		data
hyper_parameter_search_data		hyper_parameter_search_data
images		images
models		models
.gitignore		.gitignore
README.md		README.md
application.py		application.py
bow.py		bow.py
constants.py		constants.py
dictionary.py		dictionary.py
hyperparameter_search.py		hyperparameter_search.py
lstm.py		lstm.py
model_base.py		model_base.py
preprocess.py		preprocess.py
rnn.py		rnn.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring model architectures for Visual Question Answering

Bag of Words model

LSTM model

Training

Results

Both models are correct

Both models are wrong

Only BoW is correct

Only LSTM is correct

About

Releases

Packages

Languages

mhashas/Exploring-model-architectures-for-Visual-Question-Answering

Folders and files

Latest commit

History

Repository files navigation

Exploring model architectures for Visual Question Answering

Bag of Words model

LSTM model

Training

Results

Both models are correct

Both models are wrong

Only BoW is correct

Only LSTM is correct

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages