Correlated Topic Modeling

Background

Similar to Latent Dirichlet Analysis (LDA), CTM is a probabilistic approach to infer the latent topics of a document. Unlike LDA, which assumes independence between documents, CTM allows for latent topics to be correlated. I am using topic modeling to subset a collect of retail products (unstructured data) into like-itemed groups. I perform CTM on the name of the product and on the first 50 words of the product description (I assume that past 50 words is mostly noise).

I run the CTM along with a K-Means and compare results. I've been getting better accuracy with the K-Means, which I will deep dive into investigating why.

Assumptions of CTM

Each document is comprised of multiple topics and each topic is made up of multiple words
Order of words does not matter
The word distribution per topic and topic distribution per document follow a logistic normal distribution

Resources

R package topicmodels, white paper Correlated Topic Models

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
Special Symbols.csv		Special Symbols.csv
ctm.R		ctm.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Correlated Topic Modeling

Background

Assumptions of CTM

Resources

About

Releases

Packages

Languages

akokotis/CTM-in-R

Folders and files

Latest commit

History

Repository files navigation

Correlated Topic Modeling

Background

Assumptions of CTM

Resources

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages