Healthcare analytics involves utilizing data and analytical methods to enhance healthcare service delivery and patient outcomes. This project focuses on analyzing Patient Length of Stay (LOS), a critical metric affecting patient outcomes, healthcare costs, and hospital capacity.
By leveraging Snowflake and AWS, this project aims to improve patient outcomes, reduce costs, and enhance overall healthcare delivery through comprehensive analytics.
- Data Analysis in Snowflake
- Build a Machine Learning model to predict the length of stay for patients
- Schedule the AWS Sagemaker Notebook
- Perform live data scoring and insert predictions into Snowflake
- Send status mail
Training data for approximately 230k patients is stored in Snowflake across various regions and hospitals, with 19 features. Additionally, simulation data for 71k patients is available for prediction purposes.
- Introduction to Snowflake and Snowflake Worksheet
- Exploratory Data Analysis (EDA) in Snowflake
- Feature Engineering in Snowflake
- AWS Sagemaker Setup
- Data Retrieval from Snowflake using snowflake-connector-python and snowflake-sqlalchemy
- Data Preprocessing
- Feature Selection
- Model Building
- Linear Regression
- Random Forest Regression
- XGBoost Regression
- Model Predictions
- Inserting Model Predictions in Snowflake
- Scoring Function Deployment and Scheduling
- Sending Status Mail
- Snowflake Account
- AWS Account
- Understanding of basic SQL
- Tools:
AWS Sagemaker
,Snowflake
- Language:
Python
- Libraries:
snowflake-connector-python
,snowflake-sqlalchemy
,xgboost
,pandas
,numpy
,scikit-learn
The code files are available in the code.zip
file, organized into the following folders:
- Data: Contains
health_data.csv
andsimulation_data.csv
files. Note: Assumes data is present in the Snowflake table. - Python files: Contains all files from the Jupyter environment created in AWS Sagemaker.
- SQLQueries: Contains all SQL queries used in the Snowflake worksheet.