Author: Kaíque Freire dos Santos
Date: 2024/06/22
This Python project performs health impact classification using machine learning techniques. The project utilizes libraries such as pandas, numpy, matplotlib, seaborn, scikit-learn, and Keras.
The objective of this project is to build machine learning models that can predict health impacts based on various variables. This type of analysis can be useful for better understanding how different factors influence health and aiding in making informed decisions.
The project structure is organized as follows:
py-health-impact-classification/
├── dataset-variables/ # Directory for dataset
│ └── air-quality-and-health-impact-dataset.zip
│ └── air-quality-health-impact-data.csv
│ └── healthimpact.pkl
├── logs/ # Directory for log files
│ └── info-preprocessing.log
│ └── training-specific.log
├── plots/ # Directory for generated plots
├── creating-model.py # Script for model creation
├── loading-preprocessing-dataset.py # Script for loading and preprocessing data
├── .gitignore # Gitignore configuration file
└── README.md # This README file
The dataset used in this project is located in the dataset-variables/
directory and contains necessary information for training and evaluating machine learning models. The file air-quality-and-health-impact-dataset.zip
is extracted to obtain the CSV file air_quality_health_impact_data.csv
.
The data undergoes preprocessing, which includes:
- Loading the data.
- Handling missing values.
- Encoding categorical variables.
- Normalizing numerical data.
The preprocessing script is located in loading-preprocessing-dataset.py
.
Machine learning models are trained using a dense neural network with the Keras library. The training script is located in creating-model.py
and includes the use of callbacks such as EarlyStopping and ReduceLROnPlateau.
Distribution plots, correlation maps, and confusion matrices are generated and saved in the plots/
directory.
Contributions are welcome! Feel free to open an issue or submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for more details.