This is the code repository for the paper titled Bias Propagation in Federated Learning which was accepted to the International Conference on Learning Representations (ICLR) 2023.
If you have any questions, feel free to email Hongyan (hongyan@comp.nus.edu.sg).
In our paper, we show that participating in federated learning can be detrimental to group fairness. In fact, the bias of a few biased parties against under-represented groups (identified by sensitive attributes such as gender or race) propagates through the network to all parties. On naturally partitioned real-world datasets, we analyze and explain bias propagation in federated learning. Our analysis reveals that biased parties unintentionally yet stealthily encode their bias in a small number of model parameters, and throughout the training, they steadily increase the dependence of the global model on sensitive attributes. What is important to highlight is that the experienced bias in federated learning is higher than what parties would otherwise encounter in centralized training with a model trained on the union of all their data. This indicates that the bias is due to the algorithm. Our work calls for auditing group fairness in federated learning, and designing learning algorithms that are robust to bias propagation.
Our implementation of federated learning is based on the FedML library, and we use the machine learning tasks provided by folktables table.
We tested our code on Python 3.8.13
and cuda 11.4
. The essential environments are listed in the environment.yml
file. Run the following command to create the conda environment:
conda env create -f environment.yml
To run federated learning on the Income dataset, use the command:
python main.py --cf config/config_fedavg_income.yaml
Similarly, to run centralized training, use the following command:
python main.py --cf config/config_centralized_income.yaml
For standalone training, use the command:
python main.py --cf config/config_standalone_income.yaml
We report the average results over five different runs. To reproduce the results, run the command five times with different random seeds, which are indicated by common_args.random_seed
in the YAML file.
To get the results on other datasets (e.g., Health, Employment), run the main.py
file with config/config_standalone_{dataset}.yaml
, where the dataset can be health
, employment
, or income
.
Run the following command to get the performance of the models for plotting figures:
python save_information.py --task income
By default, the script collects the prediction information from 5 runs with random seeds 0 to
To generate the figures in the paper, run the plotting.ipynb
Jupyter notebook.