Skip to content

A Markov Chain that generate a proper caption for an image (sort of)

Notifications You must be signed in to change notification settings

Chen-Zhe/markov-captioner

Repository files navigation

Markov Captioner

A Markov Chain that can describe an image based on the extracted features, in this case being the object categories and its location.

This is my code for the course project of 10-701 Introduction to Machine Learning titled "Markov vs Neural Network: A Comparative Study of Classic and Modern Models for Image Captioning". You can find the project report here.

Core Mathematics

Select the next best word (or sequence of words when beam width is greater than 1) by maximizing the conditional probability of the word's appearance given its previous few words (Markov assumption) and the conditional probability of the word's appearance given all provided features.

Codebase

Dependencies

File Organization

  • DataLoader.py: helper functions to load training captions and encode object category and location information to train and test the Markov-based model
  • gen_test_captions.py: automation script to run a trained Markov-based model on Karparthy offline test or validation split with sentence generation parameters provided through the command line. Stores the captions in a JSON file for scoring
  • Heatmap.py: borrowed script to show object-word location heatmap for a trained Markov-based model
  • MarkovCaptioner.py: the core encapsulated MarkovCaptioner module for training and testing the Markov-based model. It also defines the BeamSearchCandidate class for beam search during sentence generation
  • train_markov.py: automation script to train a Markov-based model with training parameters provided through the command line and serialize the trained model to a file on disk
  • Utility.py: utility functions and constants

Parameters

  • training
    • ngram_n: n-gram size used to train Markov Chain
    • grid_size: object location encoding is based on a nxn grid. This controls n
  • sentence generation
    • sentence_length_limit: sentence cutoff length
    • beam_width: beam width for beam search
    • decay_factor: incremental penalty for generating the same word. This helps to reduce rambling of the model.

Results

ngram_n = 4, grid_size = 2, sentence_length_limit = 16, beam_width = 20, decay_factor = 1e-2

COCO ID Image Caption
184613 a group of people are standing in the grass near trees
272991 a hot dog with ketchup and mustard on top of it
403013 modern kitchen with stainless steel appliances and granite counter tops and stainless steel refrigerator microwave toaster
562150 a cat laying on top of the steering wheel

By extracting better features from the images, it might perform better than what I have here

About

A Markov Chain that generate a proper caption for an image (sort of)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages