Abstract
Classifying social media content has been the interest of researchers during the last decade. This paper proposes a probabilistic representation of topic-related keywords on social media. We aim to estimate the conditional likelihood of a class given a short text like a tweet. We used Reddit as a case study with an interest in identifying political content. We reported the performance and compared it to machine learning methods. Our model achieves a precision of 97% and takes only a few seconds to fit over 500,000 data points.
For full report: A probabilistic model for text categorization
For code structure and how to use the library check code structure