Topic Modelling and Recommendation System for News Articles using Non-Negative Matrix Factorization (NMF) and Linear discriminant analysis (LDA).
An article recommendation engine using TF-IDF where by giving a keyword, the engine would suggest the top most documents by using cosine similarity from the pool of documents is also developed.
Topic modeling is a type of statistical modeling for discovering the abstract “topics” that occur in a collection of documents.
LDA is an example of topic model and is used to classify text in a document to a particular topic. It builds a topic per document model and words per topic model, modeled as Dirichlet distributions.
NMF is an unsupervised technique so there are no labeling of topics that the model will be trained on. The way it works is that, NMF decomposes (or factorizes) high-dimensional vectors into a lower-dimensional representation. These lower-dimensional vectors are non-negative which also means their coefficients are non-negative.
- Topic Modelling Using LDA.
- Topic Modelling Using NMF.
- Cosine Similarity as a means for recommending articles.
- Given a keyword, Document Recommender system can suggest you the best documents from the pool of documents.
- Gensim
- NLTK
- Scikit-learn
- Numpy