Skip to content

Supplementary material to the paper presented in the XKDD workshop @ ECML 2023.

License

Notifications You must be signed in to change notification settings

aurorarq/xkdd23-genderbias

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploring gender bias in misclassification with clustering and local explanations

This repository contains the replication package of the paper "Exploring gender bias in misclassification with clustering and local explanations", accepted for presentation in the XKDD workshop colocated with ECML 2023.

Repository organisation

The repository is organised as follows:

  • code: Python scripts to run the experiments.
  • data: Datasets in CSV format.
  • results: CSV files with the statistics and results for the two research questions.

Dependencies

The code has been developed with Python 3.10.2 using Visual Code Studio. Machine learning algorithms are built with sklearn and fairlearn. Explanations are generated by dalex.

To run the experiments, follow these steps:

  1. Clone/zip this repository
  2. Create a virtual environment using python env/conda. For python env: python -m venv <your-venv-path>
  3. Activate your virtual environment (activate script on /bin or /Scripts depending on your OS)
  4. Install the dependencies from the requirements file. For pip: pip install -r requirements.txt
  5. Go to the code folder and run the desired script, e.g., python experiment_adult_all_data.py

Datasets

The experiments use three datasets:

  • Dataset 1: adult income, the original file is available on kaggle.
  • Dataset 2: dutch census, a preprocessed file is available on github.
  • Dataset 3: employee promotion, the original file is available on kaggle.

Experiments

For each dataset, five different models are built as described in the paper:

  • "Full" model: Baseline classifier with all attributes used for training.
  • "No gender" model: A classifier trained without the gender attribute.
  • "One gender" model: A classifier trained with data separated by gender.
  • "Mit_in" model: A classifier with in-processing bias mitigation method (ExponentiatedGradient)
  • "Mit_post" model: A classifier with post-processing bias mitigation method (ThresholdOptimizer).

The five strategies are applied to two different classification algorithms: Random Forest and Gradient Boosting Tree. After training, the instances misclassified by the model are analysed as follows:

  • Clustering (Affinity Propagation algorithm) is executed to find groups of similar missclassified instances, separated by false positives/false negatives and gender (RQ1).
  • For each cluster, a prototype instance is analysed using Break-down to discover the most relevant features causing the wrong prediction (RQ2).

Funding

This work has been developed as part of the GENIA project, funded by the Annual Research Plan (2022) of the University of Córdoba (Spain).

About

Supplementary material to the paper presented in the XKDD workshop @ ECML 2023.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages