The data consists of a tweet which is text data which has two classes[target] whether it is a real disaster or not disaster. Along with the text we have location and keyword.
data attributes used for classification:
Text: it contains the text of tweet. Target: The target attribute consists of two classes disaster/ not disaster.
predicting whether a given tweet is about a real disaster or not a disaster. we have applied various classification/linear techniques and probability measures for the determining the classification task.
-
Probabilities and Zipf's Law
a) Rank, Frequency, and Probability distribution
b) Probability vs. Rank Plot
c) Regression line fit
-
Text Vectorization
-
Terms and Conditional Probabilities distribution
-
Classification
a) Probabilistic Naive Bayes Model
b) Linear Model
c) Non-linear Classification
-
Conlcusions