Skip to content

The repository contains the California House Prices Prediction Project implemented with Machine Learning. The app was deployed on the Flask server, implemented End-to-End by developing a front end to consume the Machine Learning model, and deployed in Azure, Google Cloud Platform, and Heroku. Refer to README.md for demo and application link

License

Notifications You must be signed in to change notification settings

Tejas-TA/Machine-Learning-Prediction-of-California-House-Prices

Repository files navigation

Machine Learning Prediction of California House Prices

Heroku - https://california-house-price.herokuapp.com/
Google Cloud Platform - https://ai-california-house-prices.ue.r.appspot.com/
Azure - https://ml-california-house-price-predictions.azurewebsites.net/ [Free Tier Limit exceeded, Application might be shutdown]

California House Price App Predicts the cost of affording a home based on factors such as longitude, latitude, housing_median_age, total_rooms, total_bedrooms, population, households, median_income, median_house_value, and ocean_proximity. The data is collected from 1990 California census data


Dataset

https://www.kaggle.com/camnugent/california-housing-prices

Libraries Used

1. Flask
2. gunicorn
3. itsdangerous
4. Jinja2
5. MarkupSafe
6. Werkzeug
7. Pillow
8. Numpy
9. Scikit-learn
10. Pandas
11. Seaborn
12. Joblib
13. Matplotlib
14. HTML
15. CSS
16. Bootstrap
17. JavaScript

Project Walkthrough

1. Exploratory Data Analysis(EDA)
2. Data Visualization and Cleaning
3. The total_bedrooms feature has 207 missing values. These missing values are filled by the mean of the entire feature

image

There was an outlier in the median house value feature which was removed
image
image

  1. Feature Engineering
  2. Feature Selection
  3. Trained many Machine Learning algorithms on the dataset
  4. Predicted all the trained models on the test dataset
  5. Model Evaluation(Calculated R2, Adjusted R2, MSE, RMSE, MAE and Accuracy)
  6. Accuracies graph of all the models' data was trained on is as below-
    image
  7. Hyperparameter Tuning(GridSearch CV, Randomized Search CV) is done on the top-performing base ML algorithms(Catboost, RandomForest, LightGBM)
  8. Exported the model
  9. Developed Front End Web-based application and created a flask server
  10. App running successfully in Google Cloud Platform, Azure, and Heroku

Email - tejasta@gmail.com
LinkedIn - https://www.linkedin.com/in/tejas-ta/
Blogs - https://tejasta.medium.com/