Skip to content

n-gram language model to predict next word in a sequence of words for auto complete purpose.

License

Notifications You must be signed in to change notification settings

mujtaba-io/n-gram-language-model

Repository files navigation

N-Gram Language Model in Python

I used this Stanford university text to understand the theory.
This is my summer vacation june 2024 ML practice in which I aimed to make a words auto-complete program that is similar to how your mobile keyboard suggests word based on already typed part of message.

How it works?

This is based on conditional probability where probability of next word given already typed words is calculated and the highest probability word is predicted. For example:

image

p(time | a person who thinks all the) = probability of 'time' given 'a person who thinks all the'

We know that in data, there is only one this absurd quote so the pribability of 'time' is highest given existing string.

Usage

  • n is the number of last words it will consider. If n is 3, it will only try to predict the sequence starting from last 3 words (i.e., it will predict the 4th word after the last 3 words).
  • I rebuilt the entire model every iteration, which results in slow compute. If you only build model once, it will be extremely fast (but I believe it would be inaccurate).

Personalizing it

To personalize it for your friends group, make sure to extract all group-chat messages and put them in a .txt file and feed the file to this model. Currently it uses shakespeare and alice in wonderland text so it gives weird results.

About

n-gram language model to predict next word in a sequence of words for auto complete purpose.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages