This GitHub repository contains a comprehensive study on predicting student alcoholism and academic performance using machine learning techniques. By analyzing a diverse dataset of student attributes, including demographic information, study habits, and alcohol consumption patterns, we develop predictive models to forecast both academic outcomes and alcohol usage among students.
-
📊 Comprehensive Dataset: The study utilizes a rich dataset encompassing various student attributes such as age, gender, family background, study habits, and alcohol consumption levels.
-
🔍 Exploratory Data Analysis: We perform in-depth exploratory data analysis to uncover patterns, correlations, and insights within the dataset, providing a solid foundation for predictive modeling.
-
🔢 Data Preprocessing: The dataset undergoes meticulous preprocessing, including handling missing values, encoding categorical variables, and applying dimensionality reduction techniques like Principal Component Analysis (PCA).
-
🌳 Random Forest Regression: We employ the powerful Random Forest Regression algorithm to build predictive models for student alcohol consumption levels and academic performance. The models are optimized using grid search cross-validation to ensure robust and accurate predictions.
-
🎯 Feature Importance Analysis: We conduct feature importance analysis to identify the most influential predictors of student alcoholism and academic success. This analysis provides valuable insights into the key factors driving these outcomes.
-
📈 Model Evaluation: The predictive models are rigorously evaluated using appropriate performance metrics such as Mean Squared Error (MSE) and R-squared (R2) to assess their effectiveness and reliability.
-
💡 Actionable Insights: The study offers actionable insights and recommendations based on the findings, empowering educational institutions, policymakers, and healthcare professionals to develop targeted interventions and support strategies for students.
The repository is organized as follows:
data/
: Contains the dataset used in the studynotebooks/
: Jupyter notebooks showcasing the data preprocessing, exploratory analysis, and predictive modeling stepsmodels/
: Trained Random Forest Regression models for alcohol consumption and academic performance predictionresults/
: Visualizations, evaluation metrics, and summary of the model performancedocs/
: Detailed documentation of the study, including the project report and presentation slidessrc/
: Python scripts for data preprocessing, model training, and evaluationREADME.md
: Overview of the project and instructions for running the code
To reproduce the results and explore the study further, follow these steps:
- Clone the repository:
git clone https://github.com/your-username/student-alcoholism-prediction.git
- Install the required dependencies:
pip install -r requirements.txt
- Navigate to the
notebooks/
directory and run the Jupyter notebooks in the specified order - Explore the trained models, results, and documentation in the respective directories
We welcome contributions to enhance the study and expand its scope. If you have any ideas, suggestions, or improvements, please feel free to open an issue or submit a pull request. Let's collaborate to make a positive impact on student well-being and academic success!
We would like to express our gratitude to the researchers and institutions whose prior work has laid the foundation for this study. We also extend our appreciation to the open-source community for their valuable contributions to the tools and libraries used in this project.