Predicting Hospital Readmission of Diabetic Patients Details
- Dataset Description
- Dataset Preparation
Transform data type, Deal with the missing value, Recoding and collapsing features, Categorization, Remove the outliers
- Feature Selection
Boruta algorithm *Analytical Teachiques Split the dataset into training data and test data, Data balancing, k-Fold Cross-validation
- Models and evaluation
- Model Comparison *Conclusions
Medical insurance cost with EDA + OLS Regression Details
- Ordinary Least Squares (OLS) Regression: Simple Linear Regression, Polynomial Regression, and Multiple Linear Regression
- Exploring and Preparing the Data, Model Building, Variable Importance, Regression Assumptions, Improving Model performance
- Log Transformation, Outliers Remove Function, Multicollinearity Check, Statistical Interpretation
Predicting Customer Risk by Home Equity Loan Details
- Data Pre-processing
- Imbalance Data (Decision Tree)
Undersampling, Oversampling, Both, SMOTE
Performance Metrics: Accuracy, Error Rate, Specificity, Precision, Recall(Sensitivity), F Measure, ROC, AUC
- Build Model (Logistic Regression)
- Model Diagnostics
VIF, Cutoff, Misclassification Error, Confusion Matrix, Concordance
Health Insurance Data -- EDA/Managing (ggplot2) Details
- Visually checking distributions for a single variable
- Visually checking relationships between two variables
- Plotting data with a rug
- Cleaning data
- Data Transformations
- Check the Corralation Cofficient (visualization) paris.panels(), chartCorrelation(), ggpairs()
- Logarithmic Transformation, Variable Importance, Regression Assumptions(Residuals, Cook's D)
- Logistic Regression, Decision Trees, Conditional Inference Trees, Random Forest, Support vector machines, Choosing a best predictive solution