This repository provides tools for conducting a comprehensive fairness analysis of face verification models. It is part of the study presented in the paper Fairer Analysis and Demographically Balanced Face Generation for Fairer Face Verification published at WACV 2025 (see credits)
If you are rather intersted by the fair dataset of synthetic faces and the code to generate one, look at this repository.
It is easy to install with few dependencies and easy to use on three face academic verification benchmarks
- ✨ Overview
- 🗂️ Supported Datasets
- 📏 Computed Metrics
- ⚙️ Example Usage
- 🛠️ Setup and Installation
- 🙌 Acknowledgments and Credits
This code implements a method to estimate to which extent a face verification method is fair that is whether its performance are the same for e.g male and female persons, do not depend on the age of the person or its ethnicity.
The task of face verification determines whether two face images represent the same person. Given its score on some academic benchmarks, our method computes several fainess metrics then quantifies to which extend a particular group (e.g female) is better/less well recognized than another one (e.g male).
- Computes basic fairness metrics (from Fairlean)
- Performs a variance analysis of the model's latent space.
- Evaluates the marginal effects of demographic attributes (e.g., ethnicity, gender, age) on key metrics such as True Positive Rate (TPR) and False Positive Rate (FPR).
The analysis uses precomputed demographic attributes stored in data/
. The following attributes are considered:
- Ethnicity: ethnicities in the images pair (e.g.
White x White
). Provided or inferred using the FairFace model. - Gender: genders in the images pair (e.g.
Male x Male
) provided or inferred using FairFace. - Age: age difference in the pair (continuous value). Inferred using FairFace.
- Pose: relative position between the position of the two faces. Computed using TODO. Encoded using either:
angle
(angle between the position vectors).x_dist
,y_dist
,z_dist
: distance variables along each spacial dimension.
Note: Negative pairs with obvious demographic differences (e.g., different ethnicities or genders) are filtered out. The analysis focuses on "hard" negative pairs, as detailed in the paper.
This script supports the following datasets for evaluation:
- RFW: Racial Faces in the Wild
- BFW: Balanced Faces in the Wild
- FAVCI2D: Face Verification with Challenging Imposters and Diversified Demographics
For the standard list of pairs, the corresponding attribute labels are pre-computed and savec in csv files that can be found in data/
.
The script computes the following metrics, in order:
-
1️⃣ Basic metrics:
- Micro-avg Accuracy
- Macro-avg Accuracy
- TPR (True Positive Rate)
- FPR (False Positive Rate)
-
2️⃣ Fairness metrics (using Fairlearn):
- Demographic Parity Difference
- Demographic Parity Ratio
- Equalized Odds Difference
- Equalized Odds Ratio
-
3️⃣ Latent space analysis (ANOVA) (using statsmodels):
- Computed separately for positive and negative pairs
-
% Explained Variance (partial
$\eta^2$ ). - Significance Tests (p-values)
-
4️⃣ Marginal effects (using statsmodels): Using a logistic regression model, this computes:
- Marginal effect of demographic attributes on TPR and FPR.
- Outputs include:
- Marginal effect value.
- 95% Confidence Interval (modifiable via
--alpha
). - Significance p-value.
Run the analysis using a single command:
python compute_metrics.py --dataset=rfw --model_dist=model_results/BUPT_RFW.csv
Your face verification method must be tested on one of the available benchmarks, specified by --dataset
. Available benchmarks are bfw
, favcid
and rfw
.
Your face verification method should be run on the standard testing image pairs (two first columns in data/xxx.csv
). The resulting distances for each pair has to be saved in a CSV file with the following columns:
img_1
: filename of the first image in the pairimg_2
: filename of the second image in the pairdist
: L2 distance between the embeddings of the two images (automatically converted to angles).
Let specity the path to this file with the flag --model_dist
. We provide such files in model_results/
, corresponding to the approach we proposed in our paper.
Use the --alpha
flag to modify the 95% confidence interval (default is 0.05 for 95% confidence intervals).
To install dependencies, run:
pip install -r requirements.txt
Ensure the data/
directory is populated with the necessary demographic attributes before running the script.
Special thanks to the developers of Fairlearn, FairFace, and Statsmodels for their invaluable tools and resources.
If you find this work useful and use it on your own research, please cite our paper
@inproceedings{afm2025fairer_analysis,
author = {Fournier-Montgieux, Alexandre and Soumm, Michael and Popescu, Adrian and Luvison, Bertrand and Le Borgne, Herv{\'e}},
title = {Fairer Analysis and Demographically Balanced Face Generation for Fairer Face Verification},
booktitle = {Winter Conference on Applications of Computer Vision (WACV)"},
address = "Tucson, Arizona, USA",
year = {2025},
}