The objective is to extract information and value from large volumes of textual data using Natural Language Processing (NLP). This notebook focuses on the use of the word2vec algorithm to represent and study the existing similarities between the words of several documents and on the combination of word2vec and the unsupervised learning algorithm LDA to perform topic modeling by grouping the documents by topic and by detailing the keywords of each document.
- Python version 3.9.7
- nlp-topic-modeling
- This is a .ipynb file which contains the code.
- data
- This folder contains the data.
Here is the project pattern:
- project
> nlp-topic-modeling
- nlp-topic-modeling.ipynb
> data
- papers.csv