Nima Shahbazi, Stavros Sintos, and Abolfazl Asudeh. "FairHash: A Fair and Memory/Time-efficient Hashmap." Proceedings of the ACM on Management of Data 2.3 (2024): 1-29.
- Clone the repo
- Create a virtual environment using e.g., venv or Conda
- Install any missing packages using e.g., pip or Conda
- main packages are fairly standard (e.g., Pandas, NumPy, Matplotlib)
The algorithms are implemented in [algorithm-name].py
. You can familiarize yourself with the code by looking at the [algorithm-name]_test.py
. Each algorithm accepts the following inputs:
- Dataset
- Sensitive attribute (categorical)
- Non-sensitive attributes (ordinal real-valued)
- Number of buckets
The algorithm returns a sorted list of boundaries that specify the boundaries for each hash bucket.
- The data used in our experiments can be downloaded from here: Link
- Unzip the data folder in the root directory of the project
- The following Python files will reproduce the experiments for each algorithm:
- Necklace Splitting algorithm results:
python necklace_split_binary_test.py
- Sweep & Cut algorithm results:
python sweep_and_cut_test.py
- Sampled Ranking algorithm results:
python ranking_sampled_vector_test.py
- Ranking (ray sweeping) algorithm results:
python ranking_2d_test.py
- Local Search heuristic results:
python local_search.py
- Necklace Splitting algorithm results:
- Testing FairHash on held-out data:
python train_test.py
This project is still under development, so please beware of potential bugs, issues, etc. Use at your own responsibility in practice.
Feel free to contact the authors or leave an issue in case of any complications. We will try to respond as soon as possible.
- Nima Shahbazi: https://neemashahbazi.github.io/
- A. Asudeh: https://www.cs.uic.edu/~asudeh/
- S. Sintos: https://sites.google.com/view/stavros-sintos
This project is licensed under the MIT License — see the LICENSE.md file for details.