Project for Data Mining 1 A.A.2020/2021
link: https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset
The dataset contains fictional data created by IBM data scientists in order to uncover the factors that lead to employee attrition and explore important questions such as 'show me a breakdown of distance from home by job role and attrition' or 'compare average monthly income by education and attrition'.
Files:
Below is the list of files along with its purpose.
- Data Understanding.ipynb: Explore the dataset with the analytical tools studied and write a concise “data understanding” report describing data semantics, assessing data quality, the distribution of the variables and the pairwise correlations.
- Clustering analysis: Explore the dataset using various clustering techniques. Carefully describe your's decisions for each algorithm and which are the advantages provided by the different approaches.
- Classification: Explore the dataset using classification trees. Use them to predict the target variable.
- Association Rules.ipynb: Explore the dataset using frequent pattern mining and association rules extraction. Then use them to predict a variable either for replacing missing values or to predict target variable.