Skip to content

Latest commit

 

History

History
44 lines (36 loc) · 3.27 KB

model-serving.md

File metadata and controls

44 lines (36 loc) · 3.27 KB

Model Serving

{% hint style="warning" %} Large language models (LLMs) are hot and diverse compared to conventional models. Therefore, I have classified the related works for LLMs in another paper list. {% endhint %}

{% hint style="info" %} I am actively maintaining this list. {% endhint %}

Model Serving Systems

  • Usher: Holistic Interference Avoidance for Resource Optimized ML Inference (OSDI 2024) [Paper] [Code]
    • UVA & GaTech
  • Paella: Low-latency Model Serving with Software-defined GPU Scheduling (SOSP 2023) [Paper]
    • UPenn & DBOS, Inc.
  • Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences (OSDI 2022) [Personal Notes] [Paper] [Code] [Benchmark] [Artifact]
    • SJTU
    • REEF: GPU kernel preemption; dynamic kernel padding.
  • INFaaS: Automated Model-less Inference Serving (ATC 2021) [Paper] [Code]
    • Stanford
    • Best Paper
    • Consider model-variants
  • Clipper: A Low-Latency Online Prediction Serving System (NSDI 2017) [Personal Notes] [Paper] [Code]
    • UC Berkeley
    • Caching, batching, adaptive model selection.
  • TensorFlow-Serving: Flexible, High-Performance ML Serving (NIPS 2017 Workshop on ML Systems) [Paper]
    • Google

Auto-Configuration for Model Serving

  • Serving Unseen Deep Learning Models with Near-Optimal Configurations: a Fast Adaptive Search Approach (SoCC 2022) [Personal Notes] [Paper] [Code]
    • ISCAS
    • Characterize a DL model by its key operators.
  • Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving (SoCC 2021) [Paper] [Code]
    • HKUST & Alibaba
    • Meta learning; bayesian optimization; Kubernetes.

Survey

  • A Survey of Multi-Tenant Deep Learning Inference on GPU (MLSys 2022 Workshop on Cloud Intelligence / AIOps) [Paper]
    • George Mason & Microsoft & Maryland
  • A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities (arXiv 2111.14247) [Paper]
    • George Mason & Microsoft & Pittsburgh & Maryland