Presented in SC 2023.
- Tune training configurations (e.g., batch size) across all co-located tasks
- Choose appropriate tasks to multiplex on a GPU device
- Trade-off between mitigating interference and accelerating training progress to achieve optimal training time
- Vast search space of task configurations
- Coupling between adjusting task configurations and designing task placement policies