Skip to content

Latest commit

 

History

History
16 lines (15 loc) · 1.08 KB

data-processing.md

File metadata and controls

16 lines (15 loc) · 1.08 KB

Data Processing

  • Pecan: Cost-Efficient ML Data Preprocessing with Automatic Transformation Ordering and Hybrid Placement (ATC 2024) [Paper] [Code]
    • ETH & Google
  • Disaggregating ML Input Data Processing at Scale (SoCC 2023)
    • Google & ETH
  • GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning (SIGMOD 2023) [Paper]
    • Alibaba & PKU
  • A case for disaggregation of ML data processing (arXiv 2210.14826) [Paper]
    • Google & ETH
    • tf.data service: Disaggregate data preprocessing from ML computation.
  • Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training (ISCA 2022) [Paper]
    • Meta
    • DSI: Data storage and ingestion
    • Industry track
    • Meta's data storage and ingestion pipeline