Papers about training data quality management for ML models.
-
Updated
Oct 16, 2024
Papers about training data quality management for ML models.
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
Time series data contribution via influence functions
OpenDataVal: a Unified Benchmark for Data Valuation in Python (NeurIPS 2023)
Code for our paper 'Interpretable Triplet Importance for Personalized Ranking' accepted by CIKM 2024.
Code for the reproduction of Class-wise Shapley paper from Schoch, Xu, Ji [2022].
This is an official repository for "LAVA: Data Valuation without Pre-Specified Learning Algorithms" (ICLR2023).
The pyDVL slides for pyData Berlin 2024
The Medium of Exchange of Ecosystem
Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)
💱 A curated list of data valuation (DV) to design your next data marketplace
Supplementary programmes for DeRDaVa: Deletion-Robust Data Valuation for Machine Learning.
Code for the paper "The Journey, Not the Destination: How Data Guides Diffusion Models"
Algorithms for data valuation and benchmarks
Simulation environment for data collection dynamics.
This is an official repository for "2D-Shapley: A Framework for Fragmented Data Valuation" (ICML2023).
Code for the submission to the ML Reproducibility Challenge 2022, reproducing "If you like Shapley then you'll love the core"
PyTorch reimplementation of computing Shapley values via Truncated Monte Carlo sampling from "What is your data worth? Equitable Valuation of Data" by Amirata Ghorbani and James Zou [ICML 2019]
Add a description, image, and links to the data-valuation topic page so that developers can more easily learn about it.
To associate your repository with the data-valuation topic, visit your repo's landing page and select "manage topics."