Step by Step Procedure
-
Understanding the Businessreal world problem
-
Loading the data
-
Preprocessing the data(based on the type of data = categorical , text, Numarical )
-
Preprocessing data includes (removing outliers, impute missung values, cleaning data, remove spacial character, etc..)
-
Split the data into train, cv, test(random splitting)
-
Vectorization data ( one hot encoding)
-
Vectorizing text data(bow, tfidf, avgw2v, tfidf weighted w2v)
-
Vectorizing numarical - Normalizer
-
Applying Desition Trees Model on top of the features
-
Contactinating all the type of features(cat + num + selected text features)
-
Hyperparameter tuning to find th best estimator(GridSearchCV) and Ploting heatmaps
-
Train the Desition Trees Model using best hyperparameter and ploting auc roc-curve
-
Ploting confusion matrix(heatmaps)
-
Graphviz visualization of Decision Tree
-
Finding the False Possitive points
-
Ploting wordcloud with the words of essay text of these false positive data points
-
Ploting Box plot with price of false possitive points
-
PDF & CDF with teacher_number_of_previously_posted_projects false possitive points
-
Getting top 5k features using feature_importances_with TFIDF
-
Hyperparameter tuning to find th best estimator(GridSearchCV) and Ploting heatmaps
-
Train the Desition Trees Model using best hyperparameter and ploting auc roc-curve
-
Ploting confusion matrix(heatmaps)
-
Graphviz visualization of Decision Tree
-
Finding the False Possitive points
-
Ploting wordcloud with the words of essay text of these false positive data points
-
Ploting Box plot with price of false possitive points
-
PDF & CDF with teacher_number_of_previously_posted_projects false possitive points
-
Observation on overall model performences(Conclusion)
-
Ploting the performences by table format.