Skip to content

Drug overdose deaths are a major public health crisis, causing high mortality and economic burden. Accurate data analysis is essential for effective policies, prevention strategies, and targeted interventions.

Notifications You must be signed in to change notification settings

NavyaBoga1109/Statistical-Approach-to-Classify-Drug-Overdose-Deaths

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Statistical-Approach-to-Classify-Drug-Overdose-Deaths

Drug overdose deaths are a major public health crisis, causing high mortality and economic burden. Accurate data analysis is essential for effective policies, prevention strategies, and targeted interventions.

Abstract

This project addresses the escalating crisis of drug overdose deaths by developing a classification model to predict the type of drug most likely responsible for overdose deaths in various demographic groups in the U.S. Using a comprehensive dataset from Data.gov, the model aims to identify high-risk demographics for targeted prevention and intervention efforts. Several machine learning algorithms, including Logistic Regression, Naive Bayes, Random Forest, SVM, and GBM, are compared to select the most effective model.

Methodology

image

Data Collection

  • Data sourced from national public health databases on Data.gov.

image

  • Key attributes include drug type, demographic information (age, sex, race), year of the incident, and estimated number of deaths per 100,000 residents.

image

Data Processing

  • Imputation of missing values with the mean.
  • Standardization of data.
  • Factorization of categorical variables for machine learning algorithms.

Feature Selection

  • Selection of numerically encoded variables for analysis.

Data Splitting

  • Dataset split into 80% training and 20% testing subsets.

Models

  • Logistic Regression: Effective for handling multiple class problems.
  • Naive Bayes: Proficient in probability-based classification.
  • SVM: Finds optimal hyperplane for class separation.
  • Random Forest: Ensemble of decision trees for improved accuracy.
  • GBM: Sequentially builds an ensemble of weak models for high accuracy.

Results and Discussions

  • Evaluation metrics: Accuracy, Precision, Recall, F1 Score, ROC-AUC.

image

  • Random Forest emerged as the best model due to its high performance across precision, recall, and F1-scores.
  • Variable importance was analyzed using Random Forest.

image image

Conclusions

  • Random Forest demonstrated the best performance for this classification task, making it the most reliable model for predicting drug overdose deaths based on the dataset. image image image

R Code

  • The project includes detailed R code for data processing, model training, and evaluation.

About

Drug overdose deaths are a major public health crisis, causing high mortality and economic burden. Accurate data analysis is essential for effective policies, prevention strategies, and targeted interventions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published