Predicting-Insurance-Charges

This project combines predictive analytics with feature engineering in the healthcare/insurance domain.

The task is to predict healthcare cost of customers for a Health Insurance Company.

The variable 'charges' is the target variable.

insurance.csv

Column	Data Type	Description
`age`	int	Age of the primary beneficiary.
`sex`	object	Gender of the insurance contractor (male or female).
`bmi`	float	Body mass index, a key indicator of body fat based on height and weight.
`children`	int	Number of dependents covered by the insurance plan.
`smoker`	object	Indicates whether the beneficiary smokes (yes or no).
`region`	object	The beneficiary's residential area in the US, divided into four regions.
`charges`	float	Individual medical costs billed by health insurance.

The dataset requires some data cleaning and feature engineering to get a top predictive model.

I develop a regression model using XGBoost to predict insurance charges for customers of different demographic.

The process involves exploratory data analysis, data cleaning techniques and feature engineering.

Feature Engineering Steps Include:

Created a category bin for age called age_group ['Child', 'Young Adult', 'Middle-Aged', 'Senior']
Created a category bin for bmi called bmi_category ['Underweight', 'Normal', 'Overweight', 'Obese']
Created a variable to capture the risk associated with smoking

The XGBoost model was trained using a 5-fold cross validation.

The evaluation metric used is R-Squared.

I calculated the models feature importance and identified that the engineered features contributed signifficantly to the model's performance.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Predicting_Insurance_Charges.ipynb		Predicting_Insurance_Charges.ipynb
README.md		README.md
insurance.csv		insurance.csv
validation_dataset.csv		validation_dataset.csv