Strong baseline for visual question answering

This is a re-implementation of Vahid Kazemi and Ali Elqursh's paper Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering in PyTorch.

The paper shows that with a relatively simple model, using only common building blocks in Deep Learning, you can get better accuracies than the majority of previously published work on the popular VQA v1 dataset.

A fully trained model (convergence shown below) is available for download.

Note that the model in my other VQA repo performs better than the model implemented here.

This project uses the code provided here

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
logs		logs
resnet		resnet
README.md		README.md
config.py		config.py
data.py		data.py
model.py		model.py
preprocess-images.py		preprocess-images.py
preprocess-vocab.py		preprocess-vocab.py
train.py		train.py
utils.py		utils.py
view-log.py		view-log.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Strong baseline for visual question answering

This project uses the code provided here

About

Releases

Packages

Languages

nishitmehta1/Deep-Image-Understanding-Visual-Question-Answering

Folders and files

Latest commit

History

Repository files navigation

Strong baseline for visual question answering

This project uses the code provided here

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages