Skip to content

Latest commit

 

History

History
46 lines (34 loc) · 3.06 KB

File metadata and controls

46 lines (34 loc) · 3.06 KB

Deep Learning Recommendation Model (DLRM)

DLRM Training

  • Heterogeneous Acceleration Pipeline for Recommendation System Training (ISCA 2024) [arXiv]
    • UBC & GaTech
    • Hotline: a runtime framework.
    • Utilize CPU main memory for non-popular embeddings and GPUs’ HBM for popular embeddings.
    • Fragment a mini-batch into popular and non-popular micro-batches (μ-batches).
  • Accelerating Neural Recommendation Training with Embedding Scheduling (NSDI 2024) [Paper] [Slides] [Code]
    • HKUST
    • Herald: an adaptive location-aware inputs allocator to determine where embeddings should be trained and an optimal communication plan generator to determine which embeddings should be synchronized.
  • Bagpipe: Accelerating Deep Recommendation Model Training (SOSP 2023) [Paper]
    • UW-Madison & UChicago

DLRM Inference

  • DisaggRec: Architecting Disaggregated Systems for Large-Scale Personalized Recommendation (arXiv 2212.00939) [Personal Notes] [Paper]
    • Meta AI & WashU & UPenn & Cornell & Intel
    • Disaggregated system; decouple CPUs and memory resources; partition embedding tables.

Pruning

  • AdaEmbed: Adaptive Embedding for Large-Scale Recommendation Models (OSDI 2023) [Paper]
    • UMich SymbioticLab & Meta
    • In-training pruning.

GPU Cache

  • UGache: A Unified GPU Cache for Embedding-based Deep Learning (SOSP 2023) [Personal Notes] [Paper]
    • SJTU
    • A unified multi-GPU cache system.
    • Used for GNN training and DLR inference.
  • EVStore: Storage and Caching Capabilities for Scaling Embedding Tables in Deep Recommendation Systems (ASPLOS 2023) [Personal Notes] [Paper] [Code]
    • UChicago & Beijing University of Technology & Bandung Institute of Technology, Indonesia & Seagate Technology & Emory
    • A caching layer optimized for embedding access patterns.

Model Update

  • Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update (OSDI 2022) [Paper]
    • Tencent & Edinburgh
    • P2P model update dissemination.

Acronyms

  • DLRM: Deep Learning Recommendation Model