Skip to content

Latest commit

 

History

History
53 lines (45 loc) · 3.06 KB

README.md

File metadata and controls

53 lines (45 loc) · 3.06 KB

Predicting Employee Turnover: Scoping and Benchmarking the State-of-the-Art
Simon De Vos, Chris Rickermann, Jente Van Belle, Wouter Verbeke [2024]

This paper addresses the need for predictive analytics in workforce management by scoping and benchmarking the state-of-the-art research on employee turnover prediction. Through an extensive benchmarking experiment involving 14 classification methods and 9 datasets, we highlight the challenges posed by inconsistent methodologies and experimental setups in existing studies. Our findings provide a unified perspective to advance both academic research and practical applications in human resource management. The code and public datasets are made available on GitHub to encourage further research and collaboration.

Repository Structure

This repository is organized as follows:

|- data/
    |- ds.csv              # Dataset for experiments
    |- ibm.csv             # IBM HR dataset
    |- kaggle1.csv         # Kaggle dataset 1
    |- kaggle3.csv         # Kaggle dataset 3
    |- kaggle4.csv         # Kaggle dataset 4
    |- kaggle5.csv         # Kaggle dataset 5
|- experiments/
    |- experiment.py       # Script for conducting experiments
    |- main.py             # Main entry point for running experiments
|- performance_metrics/
    |- performance_metrics.py  # Module for evaluating model performance

Installing

We have provided a requirements.txt file:

pip install -r requirements.txt

Please use the above in a newly created virtual environment to avoid clashing dependencies.

Instructions:

  • In 'main.py':
    • Set the project directory to your custom folder. E.g., DIR = r'C:\Users\...\...\...'
    • Specify experiment configuration in settings = {'folds': 2, 'repeats': 5, ...}
    • Specify dataset used in datasets = {'real1': False, 'ibm': True, ...}. The public datasets can be found in the data folder. The datasets Real1, Real2, and Real3 are not publicly available.
    • Specify the classifications methods in methodologies = {'ab': True,'ann': True,'bnb': True, ... }
    • Hyperparameter grids can be adapted in hyperparameters = {'ab': {'n_estimators': [50, 100, 200], ...}, 'ann': {...} ...}. It is recommended to put some hyperparameter specifications in comment, as running the current specified grid takes a long time.
  • Run 'main.py' to reproduce our results. Results will be written to a text file in DIR = r'C:\Users\...\...\...'

Citing

Please cite our paper and/or code as follows:

@article{de2024predicting,
  title={Predicting Employee Turnover: Scoping and Benchmarking the State-of-the-Art},
  author={De Vos, Simon and Bockel-Rickermann, Christopher and Van Belle, Jente and Verbeke, Wouter},
  journal={Business \& Information Systems Engineering},
  pages={1--20},
  year={2024},
  publisher={Springer}
}