Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicting Airbnb Listing Prices in New York City #725

Merged
merged 4 commits into from
Aug 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49,081 changes: 49,081 additions & 0 deletions New York City Airbnb Price Detection/Dataset/AB_NYC_2019.csv

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
123 changes: 123 additions & 0 deletions New York City Airbnb Price Detection/Model/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# New York City Airbnb Price Prediction: Models

## Models Implemented
- Linear Regression (LR)
- Ridge Regression (Ridge)
- Lasso Regression (Lasso)
- ElasticNet Regression (ElasticNet)
- K-Nearest Neighbors Regression (KNN)
- Decision Tree Regression (CART)
- Random Forest Regression (RF)
- Gradient Boosting Machine (GBM)
- XGBoost
- LightGBM
- CatBoost

## Performance of the Models Based on Accuracy Scores
- **Linear Regression (LR):**
- RMSE: 70.0431
- R² Score: 0.6656
- MAE: 42.088
- MSE: 4906.0328
- Execution Time: 0.04 seconds

- **Ridge Regression (Ridge):**
- Best parameters: {'alpha': 1.0}
- RMSE: 70.0438
- R² Score: 0.6656
- MAE: 42.0872
- MSE: 4906.1288
- Execution Time: 2.1 seconds

- **Lasso Regression (Lasso):**
- Best parameters: {'alpha': 0.1}
- RMSE: 70.1052
- R² Score: 0.665
- MAE: 42.0402
- MSE: 4914.7403
- Execution Time: 1.76 seconds

- **ElasticNet Regression (ElasticNet):**
- Best parameters: {'alpha': 0.1, 'l1_ratio': 0.9}
- RMSE: 70.3563
- R² Score: 0.6626
- MAE: 42.0211
- MSE: 4950.0056
- Execution Time: 3.94 seconds

- **K-Nearest Neighbors Regression (KNN):**
- Best parameters: {'n_neighbors': 5}
- RMSE: 39.7241
- R² Score: 0.8924
- MAE: 22.0858
- MSE: 1578.0056
- Execution Time: 6.23 seconds

- **Decision Tree Regression (CART):**
- Best parameters: {'max_depth': None, 'min_samples_leaf': 1}
- RMSE: 10.2621
- R² Score: 0.9928
- MAE: 1.1928
- MSE: 105.3113
- Execution Time: 3.15 seconds

- **Random Forest Regression (RF):**
- Best parameters: {'max_depth': None, 'n_estimators': 50}
- RMSE: 6.9945
- R² Score: 0.9967
- MAE: 0.915
- MSE: 48.9226
- Execution Time: 65.45 seconds

- **Gradient Boosting Machine (GBM):**
- Best parameters: {'learning_rate': 0.1, 'n_estimators': 50}
- RMSE: 34.4356
- R² Score: 0.9192
- MAE: 19.4025
- MSE: 1185.8113
- Execution Time: 25.74 seconds

- **XGBoost:**
- Best parameters: {'learning_rate': 0.1, 'n_estimators': 50}
- RMSE: 8.4594
- R² Score: 0.9951
- MAE: 4.6483
- MSE: 71.5611
- Execution Time: 3.74 seconds

- **LightGBM:**
- Best parameters: {'learning_rate': 0.1, 'n_estimators': 50}
- RMSE: 8.9302
- R² Score: 0.9946
- MAE: 4.7429
- MSE: 79.7482
- Execution Time: 9.23 seconds

- **CatBoost:**
- Best parameters: {'depth': 6, 'iterations': 50, 'learning_rate': 0.1}
- RMSE: 22.0192
- R² Score: 0.967
- MAE: 13.5157
- MSE: 484.847
- Execution Time: 11.29 seconds

![RESULT](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___102_1.png?raw=true)
![RESULT](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___102_2.png?raw=true)
![RESULT](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___102_3.png?raw=true)
![RESULT](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___102_4.png?raw=true)
![RESULT](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___102_5.png?raw=true)
![RESULT](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___104_1.png?raw=true)
![RESULT](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___104_2.png?raw=true)
![RESULT](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___104_3.png?raw=true)
![RESULT](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___104_4.png?raw=true)
![RESULT](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___104_5.png?raw=true)

## Conclusion
From the results, we observe that Random Forest Regression (RF) performed the best in terms of RMSE, R² score, MAE, and MSE. It achieved an RMSE of 6.9945, R² score of 0.9967, MAE of 0.915, and MSE of 48.9226, albeit with a longer execution time compared to other models. K-Nearest Neighbors (KNN) and XGBoost also performed well with respectable accuracy and execution times.

## Signature
- **Name:** Aditya D
- **Github:** [https://www.github.com/adi271001](https://www.github.com/adi271001)
- **LinkedIn:** [https://www.linkedin.com/in/aditya-d-23453a179/](https://www.linkedin.com/in/aditya-d-23453a179/)
- **Topmate:** [https://topmate.io/aditya_d/](https://topmate.io/aditya_d/)
- **Twitter:** [https://x.com/ADITYAD29257528](https://x.com/ADITYAD29257528)

Large diffs are not rendered by default.

164 changes: 164 additions & 0 deletions New York City Airbnb Price Detection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# New York City Airbnb Price Prediction

## Goal
The goal of this project is to predict the prices of Airbnb listings in New York City using various regression models. We will evaluate the performance of these models using metrics such as RMSE, R² score, MAE, and MSE.

## Dataset
The dataset for this project is sourced from the [New York City Airbnb Open Data on Kaggle](https://www.kaggle.com/datasets/dgomonov/new-york-city-airbnb-open-data/data).

## Description
The dataset contains information about Airbnb listings in New York City, including features such as the name, host ID, neighbourhood, latitude, longitude, room type, price, minimum nights, number of reviews, last review date, reviews per month, calculated host listings count, availability, and more.

## What I Had Done
1. **Exploratory Data Analysis (EDA):**
- Analyzed the distribution of prices.
- Visualized the relationship between different features and the target variable (price).

2. **Data Preprocessing:**
- Handled missing values.
- Encoded categorical variables.
- Split the data into training and testing sets.

3. **Model Implementation:**
- Implemented multiple regression models.
- Performed hyperparameter tuning for each model.
- Evaluated the performance of the models.

## Models Implemented
- Linear Regression (LR)
- Ridge Regression (Ridge)
- Lasso Regression (Lasso)
- ElasticNet Regression (ElasticNet)
- K-Nearest Neighbors Regression (KNN)
- Decision Tree Regression (CART)
- Random Forest Regression (RF)
- Gradient Boosting Machine (GBM)
- XGBoost
- LightGBM
- CatBoost

## Libraries Needed
- pandas
- numpy
- scikit-learn
- xgboost
- lightgbm
- catboost
- matplotlib
- seaborn

## EDA Results
- The price distribution is skewed to the right.
- Room type and neighbourhood have a significant impact on the price.

![EDA](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___31_1.png?raw=true)
![EDA](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___31_3.png?raw=true)
![EDA](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___33_3.png?raw=true)
![EDA](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___33_5.png?raw=true)
![EDA](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___33_7.png?raw=true)
![EDA](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___33_9.png?raw=true)
![EDA](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___46_1.png?raw=true)
![EDA](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___46_2.png?raw=true)
![EDA](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___47_0.png?raw=true)
![EDA](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___48_0.png?raw=true)
![EDA](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___49_0.png?raw=true)
![EDA](https://github.com/adi271001/ML-Crate/blob/airbnb-price/New%20York%20City%20Airbnb%20Price%20Detection/Images/__results___50_0.png?raw=true)


## Performance of the Models Based on Accuracy Scores
- **Linear Regression (LR):**
- RMSE: 70.0431
- R² Score: 0.6656
- MAE: 42.088
- MSE: 4906.0328
- Execution Time: 0.04 seconds

- **Ridge Regression (Ridge):**
- Best parameters: {'alpha': 1.0}
- RMSE: 70.0438
- R² Score: 0.6656
- MAE: 42.0872
- MSE: 4906.1288
- Execution Time: 2.1 seconds

- **Lasso Regression (Lasso):**
- Best parameters: {'alpha': 0.1}
- RMSE: 70.1052
- R² Score: 0.665
- MAE: 42.0402
- MSE: 4914.7403
- Execution Time: 1.76 seconds

- **ElasticNet Regression (ElasticNet):**
- Best parameters: {'alpha': 0.1, 'l1_ratio': 0.9}
- RMSE: 70.3563
- R² Score: 0.6626
- MAE: 42.0211
- MSE: 4950.0056
- Execution Time: 3.94 seconds

- **K-Nearest Neighbors Regression (KNN):**
- Best parameters: {'n_neighbors': 5}
- RMSE: 39.7241
- R² Score: 0.8924
- MAE: 22.0858
- MSE: 1578.0056
- Execution Time: 6.23 seconds

- **Decision Tree Regression (CART):**
- Best parameters: {'max_depth': None, 'min_samples_leaf': 1}
- RMSE: 10.2621
- R² Score: 0.9928
- MAE: 1.1928
- MSE: 105.3113
- Execution Time: 3.15 seconds

- **Random Forest Regression (RF):**
- Best parameters: {'max_depth': None, 'n_estimators': 50}
- RMSE: 6.9945
- R² Score: 0.9967
- MAE: 0.915
- MSE: 48.9226
- Execution Time: 65.45 seconds

- **Gradient Boosting Machine (GBM):**
- Best parameters: {'learning_rate': 0.1, 'n_estimators': 50}
- RMSE: 34.4356
- R² Score: 0.9192
- MAE: 19.4025
- MSE: 1185.8113
- Execution Time: 25.74 seconds

- **XGBoost:**
- Best parameters: {'learning_rate': 0.1, 'n_estimators': 50}
- RMSE: 8.4594
- R² Score: 0.9951
- MAE: 4.6483
- MSE: 71.5611
- Execution Time: 3.74 seconds

- **LightGBM:**
- Best parameters: {'learning_rate': 0.1, 'n_estimators': 50}
- RMSE: 8.9302
- R² Score: 0.9946
- MAE: 4.7429
- MSE: 79.7482
- Execution Time: 9.23 seconds

- **CatBoost:**
- Best parameters: {'depth': 6, 'iterations': 50, 'learning_rate': 0.1}
- RMSE: 22.0192
- R² Score: 0.967
- MAE: 13.5157
- MSE: 484.847
- Execution Time: 11.29 seconds

## Conclusion
From the results, we observe that Random Forest Regression (RF) performed the best in terms of RMSE, R² score, MAE, and MSE. It achieved an RMSE of 6.9945, R² score of 0.9967, MAE of 0.915, and MSE of 48.9226, albeit with a longer execution time compared to other models. K-Nearest Neighbors (KNN) and XGBoost also performed well with respectable accuracy and execution times.

## Signature
- **Name:** Aditya D
- **Github:** [https://www.github.com/adi271001](https://www.github.com/adi271001)
- **LinkedIn:** [https://www.linkedin.com/in/aditya-d-23453a179/](https://www.linkedin.com/in/aditya-d-23453a179/)
- **Topmate:** [https://topmate.io/aditya_d/](https://topmate.io/aditya_d/)
- **Twitter:** [https://x.com/ADITYAD29257528](https://x.com/ADITYAD29257528)
12 changes: 12 additions & 0 deletions New York City Airbnb Price Detection/Results/model_results.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Model,RMSE,R^2 Score,MAE,MSE,Execution Time
LR,71.53198315955038,0.6655853098384882,42.08796683221192,4906.032824442506,0.47637128829956055
Ridge,71.53195975977346,0.665578766303157,42.08723818206346,4906.128821418915,0.28203392028808594
Lasso,72.01450239637934,0.6568096535173081,42.09192543435565,5034.776145936374,0.8430032730102539
ElasticNet,79.13825818380568,0.5858840839391268,47.80563759675594,6075.290162455154,0.8507664203643799
KNN,44.59045947757754,0.8924368678281834,22.085775641681153,1578.0056099805709,5.567161321640015
CART,13.87112619469831,0.9947811552830802,1.1475611003170059,76.56309438592903,2.856947660446167
RF,9.887532170296272,0.9966269733412237,0.8669613457408732,49.48400890428468,184.95560932159424
GBM,20.74019935300425,0.9725947775705186,11.353273683981142,402.0484887647979,61.66828656196594
XGBoost,8.454398383271927,0.9968488083506803,2.92793343415569,46.22957699676574,3.062408208847046
LightGBM,8.443987474496604,0.9965805295793418,3.2156694766593765,50.16536240634922,3.9357712268829346
CatBoost,5.676228015988899,0.9989088680589349,2.1128429165287175,16.007458033849126,36.4131178855896
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Model,RMSE,R^2 Score,MAE,MSE,Execution Time
LR,70.04307834784609,0.6655853098384882,42.08796683221192,4906.032824442506,0.03542757034301758
Ridge,70.0437636154634,0.665578766303157,42.08723818206346,4906.128821418915,2.2162649631500244
Lasso,70.1052084599152,0.6649917768971136,42.040219767475165,4914.740253208166,1.7024812698364258
ElasticNet,70.35627628470488,0.6625879498798284,42.02107787330175,4950.005612649725,3.715897798538208
KNN,39.72411874391389,0.8924368678281834,22.085775641681153,1578.0056099805709,6.37768816947937
CART,8.44225405583764,0.9951418409150795,1.1033848041722059,71.27165354330708,3.3038580417633057
RF,6.968251629355629,0.9966901939359489,0.9063523877697105,48.55653077001739,64.83306241035461
GBM,34.435610438766005,0.9191703925764881,19.40251582210346,1185.8112662904505,25.130359888076782
XGBoost,8.459378672215337,0.995122111945242,4.648281218730327,71.56108751993172,4.136552095413208
LightGBM,8.930182616872319,0.9945640484487296,4.742892009779708,79.74816157068852,9.654861450195312
CatBoost,22.01924241725614,0.9669508995695035,13.515727180136386,484.847036629892,10.565703630447388
9 changes: 9 additions & 0 deletions New York City Airbnb Price Detection/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
numpy==1.24.3
pandas==1.5.3
matplotlib==3.7.2
seaborn==0.12.2
catboost==1.1
lightgbm==3.3.5
scikit-learn==1.3.0
xgboost==1.7.6
folium==0.14.0
Loading