A probabilistic model for text categorization

Classifying political content on Reddit

Abstract

Classifying social media content has been the interest of researchers during the last decade. This paper proposes a probabilistic representation of topic-related keywords on social media. We aim to estimate the conditional likelihood of a class given a short text like a tweet. We used Reddit as a case study with an interest in identifying political content. We reported the performance and compared it to machine learning methods. Our model achieves a precision of 97% and takes only a few seconds to fit over 500,000 data points.

For full report: A probabilistic model for text categorization

For code structure and how to use the library check code structure

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
data		data
scripts		scripts
.gitignore		.gitignore
A_probabilistic_model_for_text_categorization.pdf		A_probabilistic_model_for_text_categorization.pdf
README.md		README.md
code_structure.md		code_structure.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A probabilistic model for text categorization

Classifying political content on Reddit

About

Releases

Packages

Languages

khaledfouda/A-probabilistic-model-for-text-categorization

Folders and files

Latest commit

History

Repository files navigation

A probabilistic model for text categorization

Classifying political content on Reddit

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages