GitHub - aurorarq/xkdd23-genderbias: Supplementary material to the paper presented in the XKDD workshop @ ECML 2023.

Exploring gender bias in misclassification with clustering and local explanations

This repository contains the replication package of the paper "Exploring gender bias in misclassification with clustering and local explanations", accepted for presentation in the XKDD workshop colocated with ECML 2023.

Repository organisation

The repository is organised as follows:

code: Python scripts to run the experiments.
data: Datasets in CSV format.
results: CSV files with the statistics and results for the two research questions.

Dependencies

The code has been developed with Python 3.10.2 using Visual Code Studio. Machine learning algorithms are built with sklearn and fairlearn. Explanations are generated by dalex.

To run the experiments, follow these steps:

Clone/zip this repository
Create a virtual environment using python env/conda. For python env: python -m venv <your-venv-path>
Activate your virtual environment (activate script on /bin or /Scripts depending on your OS)
Install the dependencies from the requirements file. For pip: pip install -r requirements.txt
Go to the code folder and run the desired script, e.g., python experiment_adult_all_data.py

Datasets

The experiments use three datasets:

Dataset 1: adult income, the original file is available on kaggle.
Dataset 2: dutch census, a preprocessed file is available on github.
Dataset 3: employee promotion, the original file is available on kaggle.

Experiments

For each dataset, five different models are built as described in the paper:

"Full" model: Baseline classifier with all attributes used for training.
"No gender" model: A classifier trained without the gender attribute.
"One gender" model: A classifier trained with data separated by gender.
"Mit_in" model: A classifier with in-processing bias mitigation method (ExponentiatedGradient)
"Mit_post" model: A classifier with post-processing bias mitigation method (ThresholdOptimizer).

The five strategies are applied to two different classification algorithms: Random Forest and Gradient Boosting Tree. After training, the instances misclassified by the model are analysed as follows:

Clustering (Affinity Propagation algorithm) is executed to find groups of similar missclassified instances, separated by false positives/false negatives and gender (RQ1).
For each cluster, a prototype instance is analysed using Break-down to discover the most relevant features causing the wrong prediction (RQ2).

Funding

This work has been developed as part of the GENIA project, funded by the Annual Research Plan (2022) of the University of Córdoba (Spain).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
code		code
data		data
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring gender bias in misclassification with clustering and local explanations

Repository organisation

Dependencies

Datasets

Experiments

Funding

About

Releases 1

Packages

Languages

License

aurorarq/xkdd23-genderbias

Folders and files

Latest commit

History

Repository files navigation

Exploring gender bias in misclassification with clustering and local explanations

Repository organisation

Dependencies

Datasets

Experiments

Funding

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages