Applying the learnings in the Machine Learning course, Detailed exploratory data analysis and various Advanced regression models XG-Boost, Ridge, Lasso, Elastic-net, Gradient Boost, SVR are employed.
STEPS 2 to 5 are for exploratory data analysis.
Predicting Sales Price of a given house in real estate market using various statistical analysis tools. Obtained the closest price that a client might sell their house utilizing machine learning. Presented in the form of iPython Notebooks.
Data is extracted from Kaggle Running code file can be found Here
Data is read from files and SalesPrice in training dara, ID for houses in test data are located
Many categorical features hold ratings which can be converted into numbers and missing values can be taken as 0.
Handling missing values manually and eliminating features with majority missing values
Handling missing values manually by looking at features having close correlation with them and looking for possible collinearity and making new features from available features that have better correlation with Target prediction.
Imputing if there are any remaining missing values, looking for outliers, skewness, observing feature importances and analysing correlation between each category of features together
Ridge regression, Lasso regression, ElasticNet, XGBoost,SVR, Gradient Boost Regression models have been formed with Grid search for Best Hyper Parameters. Stacking all models and using XGBoost for 2nd level training improved accuracy of predictions.
For improving accuracy, all trained model predictions are Weight Averaged with most weightage for Stacking to obtain a Ensembled prediction of Sales Price