Skip to content

SureshAthanti/Probabilistic_Classification_Model

Repository files navigation

Probabilistic_Classification_Model

brief description of Data:

The data consists of a tweet which is text data which has two classes[target] whether it is a real disaster or not disaster. Along with the text we have location and keyword.

data attributes used for classification:

Text: it contains the text of tweet. Target: The target attribute consists of two classes disaster/ not disaster.

Classification task:

predicting whether a given tweet is about a real disaster or not a disaster. we have applied various classification/linear techniques and probability measures for the determining the classification task.

List of steps:

  1. Probabilities and Zipf's Law

    a) Rank, Frequency, and Probability distribution

    b) Probability vs. Rank Plot

    c) Regression line fit

  2. Text Vectorization

  3. Terms and Conditional Probabilities distribution

  4. Classification

    a) Probabilistic Naive Bayes Model

    b) Linear Model

    c) Non-linear Classification

  5. Conlcusions