Welcome to my Hotel Sentiment Analysis project! This repository contains all the necessary components to scrape, analyze, predict and summarise sentiments from hotel reviews.
Our project focuses on predicting positive & negative sentiments from hotel reviews using a combination of advanced Natural Language Processing (NLP) techniques and classical Machine Learning models. We aim to provide a robust solution that can assist hotels in understanding guest satisfaction through automated sentiment analysis.
-
Artifacts:
NPN_Logistic_Regression_Model.pkl
: Logistic Regression model for comparison.NPN_Random_Forest_Model.pkl
: Random Forest model for advanced predictions.NPN_Naive_Bayes_Model.pkl
: Naive Bayes model used for baseline performance.NPN_XGBoost_Model.pkl
: XGBoost model for high-performance predictions.NPN_LightGBM_Model.pkl
: LightGBM model trained for sentiment analysis.NPN_Label_Encoder.pkl
: Pre-trained label encoder for categorical variables.NPN_TF_IDF_Vectorizer.pkl
: TF-IDF vectorizer to transform text data.
-
Dataset:
Scraped_Dataset.csv
: The dataset scraped from various hotel review sites.Single_Hotel_Dataset.csv
: Dataset focusing on a single hotel's reviews.
-
notebooks:
Hotel_Sentiment_Analysis.ipynb
: The Jupyter notebook detailing the model training and evaluation.
-
src:
__init__.py
: Initialization for the source module.prediction.py
: Contains functions for making sentiment predictions.summariser.py
: Script for summarizing reviews and key sentiments.utils.py
: Utility functions used throughout the project.
-
templates:
img/
: Images and media files used in the project.
-
Web_Scraping:
scraper.py
: The web scraping script to extract reviews from online sources.test.py
: Testing scripts to validate the scraper's performance.
-
.gitignore
: Files and folders to be ignored by Git. -
requirements.txt
: Python packages required to run the project. -
.streamlit/
: Streamlit configuration files for deploying the web app. -
streamlit_app.py
: The main Streamlit application file that launches the web interface for the project, allowing users to interact with the sentiment analysis model and visualize the results. -
setup.py
: Setup script for easy installation of the project.
Make sure you have Python installed. Clone this repository and install the required packages:
git clone https://github.com/your-repo/NPN-Cognizant-Hackathon.git
cd NPN-Cognizant-Hackathon
pip install -r requirements.txt
-
Scrape Data: Use the web scraper to collect hotel reviews.
python Web_Scraping/test.py
-
Run Analysis: Execute the Jupyter notebook to train models and analyze sentiments.
jupyter notebook notebooks/Hotel_Sentiment_Analysis.ipynb
-
Deploy the App: Deploy the Streamlit web app to showcase your results.
streamlit run streamlit_app.py
- Logistic Regression: Baseline model for comparison.
- Random Forest: Ensemble method to capture complex patterns.
- Naive Bayes: Quick and interpretable model.
- LightGBM & XGBoost: Gradient boosting models for high accuracy.
Our models have been fine-tuned and evaluated to achieve high accuracy in predicting sentiment from hotel reviews. Detailed results can be found in the notebook.
This project is licensed under the MIT License - see the LICENSE file for details.