In this project we will be using the publicly available and Kaggle-popular LendingClub data set to train Linear Regression and Extreme Gradient Descent Boosted Decision Tree models to predict interest rates assigned to loans.
First, we will clean and prepare the data. This includes feature removal, feature engineering, and string processing.There are several entries where values have been deleted to simulate dirty data.
Then, we will build machine learning models in Python to predict the interest rates assigned to loans. We will evaluate our models' performances using the root mean squared error (RMSE) metric and compare our models' results.