Skip to content

Latest commit

 

History

History
18 lines (11 loc) · 653 Bytes

File metadata and controls

18 lines (11 loc) · 653 Bytes

Interference-aware multiplexing for deep learning in GPU clusters: A middleware approach

Meta Info

Presented in SC 2023.

Understanding the paper

Opportunities in co-locating DL training tasks

  • Tune training configurations (e.g., batch size) across all co-located tasks
  • Choose appropriate tasks to multiplex on a GPU device

Challenges

  • Trade-off between mitigating interference and accelerating training progress to achieve optimal training time
  • Vast search space of task configurations
  • Coupling between adjusting task configurations and designing task placement policies