- UW FinTech Boot Camp - Project 3 Submission - May 2021
- Monique T.
- Tony H.
- April A.
- Nick N.
- After a 1-year pandemic, we all need a vacation...
- Our group wanted to conduct a project using the tools learned in both Module 1 & Module 2 of the boot camp (python + machine learning). We set out looking for data sets and questions related to supply chain management and demand forecasting, which led us to a very interesting prompt related to forecasting hotel reservation cancellations. Given we all need a good vacation, we pursued the the following project...
- Using data and machine learning models, can hotel reservation cancellations be predicted?
- If yes to the above question, what models and methods most accurately predict hotel reservation cancellations?
- Build a ML model that can predict whether a hotel reservation will be cancelled
- Analyze and understand data via organization, visualization, and dashboards
- Data Cleaning & Shaping
- Data comes from/affiliated with an article: Hotel Booking Demand Datasets
- Data was cleaned by Thomas Mock and Antoine Bichat (additional cleaning and shaping conducted by our team)
- Is there further noise/info we want to weed out? Label encoding?
- Machine Learning Model
- Which model to use (try multiple models)
- Ensemble/Classifier/Decision Tree/Regression? Pick several and also apply resampling techniques if needed. We predict classifier models will be the most effective given we will be classifying a binary outcome (cancelled vs not cancelled)
- Which parameters/inputs produce the best outcomes (train/test split; different inputs for each ML model type aka reference documentation; which models are most efficient; what features in the dataset can we eliminate)
- Look at data in different ways? Is the model/data better for predicting in the summer/winter/fall/etc? Should we try forecasting for specific date ranges, like spring break, holiday breaks, etc. This will be a reach if we have time.
- Data Visualization
- Visualize by city hotel & resort hotel
- Visualize by season
- Visualize different demographics
- Visualize different ML model outcomes?
- Explore other means and methods of visualization that may give unique insight into the data set
- ML Models & Workbook - April + Nick
- Data Visualizations & Dashboard - Monique + Tony
- Slide Show - Whole Team
- Jupyter lab
- Python
- Pandas
- Numpy
- Sklearn
- Pyviz
- More to be imported and utilized in our python files
- Analysis Folder - ML python files
- Visualizations Folder - visualization python files
- Data Folder- original data set
- Final presentation deck
- Using machine learning models, our Team was able to predict hotel cancellations with confidence (particularly using the SMOTEENN Resampling + BalancedRandomForestClassifier model, which rendered a ~90% accuracy score). Vizualizations of our final accuracy score outcomes are below. Please see our uploaded slide show for more information on the outcomes.
- Intro & Hypothesis: Tony
- Visualizations & Intro to Data: Monique
- Data Preparation & Model Selection: Nick
- Model Outcomes & Takeaways: April