Step by Step Procedure
-
Understanding the Businessreal world problem
-
Loading the data
-
Preprocessing the data(based on the type of data = categorical , text, Numarical )
-
Preprocessing data includes (removing outliers, impute missung values, cleaning data,etc..)
-
Split the data into train, cv, test
-
Vectorization data ( one hot encoding)
-
Vectorizing text data
-
Normalizing
-
Contactinating all the type of features(cat + text + num)
-
Hyperparameter tuning to find th best estimator(GridSearch)
-
Ploting the performence of the model using heatmaps
-
Train the Random Forest model using best hyperparameter and ploting auc roc-curve
-
Plot confusion matrix
-
Hyperparameter tuning to find th best estimator(RandomizedSearch)
-
Ploting the performence of the model using heatmaps
-
Train the XGBoost model using best hyperparameter and ploting auc roc-curve
-
Plot Confusion Matrix
-
Observation on overall model performences
-
Ploting the performences by tableu format.