Deep Learning Recommendation Model (DLRM)

DLRM Training

Heterogeneous Acceleration Pipeline for Recommendation System Training (ISCA 2024) [arXiv]
- UBC & GaTech
- Hotline: a runtime framework.
- Utilize CPU main memory for non-popular embeddings and GPUs’ HBM for popular embeddings.
- Fragment a mini-batch into popular and non-popular micro-batches (μ-batches).
Accelerating Neural Recommendation Training with Embedding Scheduling (NSDI 2024) [Paper] [Slides] [Code]
- HKUST
- Herald: an adaptive location-aware inputs allocator to determine where embeddings should be trained and an optimal communication plan generator to determine which embeddings should be synchronized.
Bagpipe: Accelerating Deep Recommendation Model Training (SOSP 2023) [Paper]
- UW-Madison & UChicago

DisaggRec: Architecting Disaggregated Systems for Large-Scale Personalized Recommendation (arXiv 2212.00939) [Personal Notes] [Paper]
- Meta AI & WashU & UPenn & Cornell & Intel
- Disaggregated system; decouple CPUs and memory resources; partition embedding tables.

AdaEmbed: Adaptive Embedding for Large-Scale Recommendation Models (OSDI 2023) [Paper]
- UMich SymbioticLab & Meta
- In-training pruning.

UGache: A Unified GPU Cache for Embedding-based Deep Learning (SOSP 2023) [Personal Notes] [Paper]
- SJTU
- A unified multi-GPU cache system.
- Used for GNN training and DLR inference.
EVStore: Storage and Caching Capabilities for Scaling Embedding Tables in Deep Recommendation Systems (ASPLOS 2023) [Personal Notes] [Paper] [Code]
- UChicago & Beijing University of Technology & Bandung Institute of Technology, Indonesia & Seagate Technology & Emory
- A caching layer optimized for embedding access patterns.

Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update (OSDI 2022) [Paper]
- Tencent & Edinburgh
- P2P model update dissemination.