Skip to content

micheleandreucci/Data-Mining-1

Repository files navigation

DM1 Project

Project for Data Mining 1 A.A.2020/2021

Dataset

link: https://www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset

The dataset contains fictional data created by IBM data scientists in order to uncover the factors that lead to employee attrition and explore important questions such as 'show me a breakdown of distance from home by job role and attrition' or 'compare average monthly income by education and attrition'.

Files:

Below is the list of files along with its purpose.

  • Data Understanding.ipynb: Explore the dataset with the analytical tools studied and write a concise “data understanding” report describing data semantics, assessing data quality, the distribution of the variables and the pairwise correlations.
  • Clustering analysis: Explore the dataset using various clustering techniques. Carefully describe your's decisions for each algorithm and which are the advantages provided by the different approaches.
  • Classification: Explore the dataset using classification trees. Use them to predict the target variable.
  • Association Rules.ipynb: Explore the dataset using frequent pattern mining and association rules extraction. Then use them to predict a variable either for replacing missing values or to predict target variable.