Skip to content

Attention Modeling for Image Captioning described in 'Show, Attend and Tell'

Notifications You must be signed in to change notification settings

chaitanya100100/Attention-Modeling-for-Image-Captioning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Attention-Modeling

Pytorch implementation of Attention Modeling for Image Captioning

Implementation

  • A normal CNN-RNN architecture for image captioning
  • Implemenation of Visual Attention for image captioning described in Show, Attend and Tell

Results

I have validated the methods on standard Flickr8k and MSCOCO dataset. They achieves state of the art accuracy. Results are as follows:

Normal CNN-RNN architecture
  • For MSCOCO dataset

    • BLEU-1 : 0.705
    • BLEU-4 : 0.265
  • For Flickr8k dataset

    • BLEU-1 : 0.630
    • BLEU-4 : 0.177
Visual Attention architecture
  • For MSCOCO dataset

    • BLEU-1 : 0.731
    • BLEU-4 : 0.320
  • For Flickr8k dataset

    • BLEU-1 : 0.655
    • BLEU-4 : 0.218

TO DO

  • Add attention visualization utility

About

Attention Modeling for Image Captioning described in 'Show, Attend and Tell'

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published