For this case-study assignment, students assume the role of loan officer at a bank and are asked to approve or deny a loan by assessing its risk of default using ML.
Stage | Description | Output | Working Docs / Notebooks |
---|---|---|---|
Project Setup | Project proposal and High level planning | - Project Proposal | - High level planning docs |
ETL I | Data Adquisition, Inspection of the structure and EDA | Primary Data Report Tableau: EDA Insights |
- Data Understanding - Conceptualization as Database - Data Preprocess - EDA: Univariate - EDA: Bivariate - EDA: Multivariate |
ETL II | Data Imputation, Data Enrichment & Feature Engineering | XXXX Report Tableau: Primary Insights |
- Data Imputation: Geographic Data - Data Enrichment: Recession - Feature Engineering |
Models | Exploration of diverse data models | Conceptual Web App Final Report |
- Factor Analysis - Support Vector Machines - Logistic - Clusterings - Random Forest - Necessary Condition Analysis - Interpretable Rules: Bayesian Networks |
Aditional Data And Furter Reading:
- Find a Location by Address
- Rural-Urban Commuting Area Codes USDA
- MySQL Function to remove accents and special characters
- La Sierra Tarahumara, epicentro del suicidio en México
- Local Spatial Autocorrelation (tecnica de Tarahumaras)
- https://www.slideshare.net/POOJAPATIL211/should-this-loan-be-approved-or-denied
- https://www.sciencedirect.com/science/article/abs/pii/S0020025515002960
- https://analyticsindiamag.com/a-guide-to-inferencing-with-bayesian-network-in-python/
- https://github.com/hayesall/bn-rule-extraction
- Probabilistic Approach to Extract Qualitative Knowledge
- https://saturncloud.io/blog/evaluating-logistic-regression-with-crossvalidation/
- https://github.com/stelladeecoder/sba_dataset/blob/main/sba.ipynb
- feature engineering
- Trigonometric feature engineering
- https://feature-engine.trainindata.com/en/1.3.x/user_guide/creation/CyclicalFeatures.html
- https://scikit-learn.org/stable/auto_examples/applications/plot_cyclical_feature_engineering.html
Exercise Proposed & published by:
Min Li, Amy Mickel, and Stanley Taylor
College of Business Administration, California State University, Sacramento, CA
-
JOURNAL OF STATISTICS EDUCATION
2018, VOL. 26, NO. 1, 55–66
https://doi.org/10.1080/10691898.2018.1434342