Skip to content

A script that automatically infers the topics discussed in a collection of documents.

License

Notifications You must be signed in to change notification settings

vmvargas/topic-modeling-using-LDA

Repository files navigation

Overview

Topic models automatically infer the topics discussed in a collection of documents. These topics can be used to summarize and organize documents, or used for featurization and dimensionality reduction in later stages of the data analysis.

LDA (Latent Dirichlet Allocation) is a topic model library. I used LDA in this project to derive ‘topics’ from the dataset provided, the code was written in Python.

Dataset

The dataset was obtained from Yelp’s website.

Script Steps

  1. Prepare the data:
  • Tokenizing
  • Stopping
  • Stemming
  1. Construct a Document-term Matrix
  2. Apply the LDA Model
  3. Examine the results

License

This project is licensed under the GNU 2.0 License - see the LICENSE.md file for details

Releases

No releases published

Packages

No packages published

Languages