This repository contains code to reproduce the paper
If You Like Shapley Then You’ll Love the Core
for the ML Reproducibility Challenge 2022.
We use Python version 3.10 for this repository.
We use Poetry for dependency management. More specifically version 1.2.0
.
After installing Poetry, run the following command to create a virtual environment and install all dependencies:
poetry install
You can then activate the virtual environment using:
poetry shell
We use DVC to run the experiments and track their results.
To reproduce all results use:
dvc repro
To reproduce the results of this experiment use:
dvc repro feature-valuation-least-core
You can find the results under output/feature_valuation_least_core.
To reproduce the results of this experiment use:
dvc repro data-valuation-synthetic
You can find the results under output/data_valuation_synthetic.
Note: This experiment requires downloading the imagenet-1k dataset from HuggingFace Datasets. For that you need to first create an account and then login using the huggingface-cli tool.
To reproduce the results of this experiment use:
dvc repro data-valuation-dog-vs-fish
You can find the results under output/data_valuation_dog_vs_fish.
To reproduce the results of this experiment use:
dvc repro fixing-mislabeled-data
You can find the results under output/fixing_mislabeled_data.
To reproduce the results of this experiment use:
dvc repro noisy-data
You can find the results under output/noisy_data.
Make sure to install the pre-commit hooks:
pre-commit install