This project aims at detecting topics from negative and positive hotel reviews from Tripadvisor.
The first part of the project consists in implementing BERTopic first introduced in this paper.
The corresponding work main lies in the notebooks
section as it is quite visual.
The second part of the project [WIP] will consist in using transformers models for zero-shot learning.
The last part [WIP] will attempt at doing some few-shot learning by manually labelling some instances.
First, make sure you set up your machine to run this project locally, or create a virtual machine (VM) instance for running in the cloud.
Next, set up your preferred IDE to use the installed Poetry environment, or activate it in your terminal.
cd /path/to/tripadvisor-hotel-reviews-topic-modeling
poetry shell
You can find more information about using this project in the user guide.
Aside from a clone of this repository, you will need the following requirements to run this project locally:
- Python 3.9 or later installed
- Poetry installed on your machine for virtual environment and dependency management
- Python packages installed using Poetry
cd /path/to/tripadvisor-hotel-reviews-topic-modeling poetry install
- Have your Kaggle API token in the location ~/.kaggle/kaggle.json
To use a specific Python version in your Poetry environment, please refer to this guidance.
We love contributions! If you want to help build and improve our project, please read our contributing guidelines beforehand.