Skip to content

nishitmehta1/Deep-Image-Understanding-Visual-Question-Answering

Repository files navigation

Strong baseline for visual question answering

This is a re-implementation of Vahid Kazemi and Ali Elqursh's paper Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering in PyTorch.

The paper shows that with a relatively simple model, using only common building blocks in Deep Learning, you can get better accuracies than the majority of previously published work on the popular VQA v1 dataset.

A fully trained model (convergence shown below) is available for download.

Graph of convergence of implementation versus paper results

Note that the model in my other VQA repo performs better than the model implemented here.

This project uses the code provided here