Skip to content

A flexible text generator and query likelihood estimator based on Markov chains.

License

Notifications You must be signed in to change notification settings

vekoada/markov-query-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

code = perceive

Assignment 5: Open-ended IR Technique

Westmont College Fall 2023

CS 128 Information Retrieval and Big Data

Assistant Professor Mike Ryu (mryu@westmont.edu)

Author Information

Guide

MarkovModel

MarkovModel is a Python class that implements a query likelihood language model based on Markov chains. This class allows you to create language models for generating text and estimating the probability of a query given a document. The model supports both character-based and word-based representations with customizable order.

Project Structure

  • src/models.py: Contains the implementation of the MarkovModel class. Also includes main() which defines example usage of the MarkovModel class.
  • data/: An empty directory where you can store your training data.
  • test/: An empty directory intended for future testing.

Installation

No installation required, but make sure you have the required libraries from requirements.txt. Then, just include the MarkovModel class in your project.

Usage

  1. Import the MarkovModel class:
    from src.models import MarkovModel
  2. Create an instance of the MarkovModel class by providing the mode ('char' or 'word') and the training text:
    text_data = [...]  # List of training documents
    markov_model = MarkovModel(mode='word', text=text_data, n=3)
  3. Train the model with the training data:
    markov_model.train(text=text_data)
  4. Generate text using the trained model:
    generated_text = markov_model.generate(start='The quick brown fox', max_len=200)
    print(generated_text)
  5. Estimate the probability of a query given a document:
    query = 'natural language processing'
    result = markov_model._most_probable_doc(query=query, l=0.7, corpus_percentage=1.0)
    print(result)

Data and Testing

The data/ directory is intended for storing your training data. Feel free to populate this directory with text documents to train your model!

The test/ directory is currently empty, and more thorough testing should be implemented in the future to ensure the reliability of the MarkovModel class.

Feel free to contribute by adding your own test cases or improving the model based on your specific use case!

Acknowledgements and Sources

While working on this assignment, I used the following resources:

About

A flexible text generator and query likelihood estimator based on Markov chains.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages