Skip to content

Latest commit

 

History

History
87 lines (63 loc) · 5.97 KB

nsdi-2024.md

File metadata and controls

87 lines (63 loc) · 5.97 KB

NSDI 2024

Meta Info

Homepage: https://www.usenix.org/conference/nsdi24

Paper list: https://www.usenix.org/conference/nsdi24/technical-sessions

Papers

Resource Management

  • Resiliency at Scale: Managing Google’s TPUv4 Machine Learning Supercomputer [Paper]
    • Google
    • Experience in designing and operating the software infrastructure that allows TPUv4 supercomputers to operate at scale.
  • Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices [Paper] [Slides] [Code]
    • USTC & ETH & MSR
    • Minimize CPU allocation of microservice applications while meeting SLO.
    • Service-level (low overhead & fast reaction) vs. Application-level (global visibility)
      • Captains (service-level): control based on throttle ratio target; collect data every 100ms, adjust allocation every 1s.
      • Tower (application-level): determine the best throttle targets for Captains to achieve; online learning (contextual bandit algorithm); one step per minute, each step runs in ~100ms.
  • CASSINI: Network-Aware Job Scheduling in Machine Learning Clusters [Paper]
    • MIT & UT-Austin
    • Consider the communication pattern of different jobs while placing them on network links.

Large Language Models (LLMs)

  • LLM characterization
    • Characterization of Large Language Model Development in the Datacenter [Paper] [Slides] [Trace]
      • NTU & PKU & CUHK & Shanghai AI Lab
  • LLM training
    • MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs [Paper] [Slides] [Code]
      • ByteDance & PKU

Utilize Spot Instances

  • Can't Be Late: Optimizing Spot Instance Savings under Deadlines [Paper] [Trace]
    • UC Berkeley
    • Outstanding Paper
    • Characterization (e.g., availability, pricing, duration) of three-month-long spot availability traces on AWS.
    • Uniform Progress: a policy to make uniform progress towards the deadline, by distributing the job computation uniformly across the time.
  • Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances [Paper] [Slides] [Code]
    • CUHK & ByteDance & CMU & UCLA & Microsoft
    • Proactively adjust the parallelization strategy of a DNN training job for future preemptions to maximize preemption-aware throughput (i.e., liveput).

Multimodal Models

  • DISTMM: Accelerating Distributed Multimodal Model Training [Paper]
    • Ohio State University & AWS
    • Partition and parallelize the submodules of a multimodal model based on their modalities and redistribute the training data.

Diffusion Models

  • Approximate Caching for Efficiently Serving Text-to-Image Diffusion Models [Paper] [Slides]
    • Adobe Research & UIUC
    • Approximate caching: reduce a certain number of denoising steps by reusing intermediate noise states created during a prior image generation.

Deep Learning Recommendation Models (DLRMs)

  • Accelerating Neural Recommendation Training with Embedding Scheduling [Paper] [Slides] [Code]
    • HKUST
    • Herald: an adaptive location-aware inputs allocator to determine where embeddings should be trained and an optimal communication plan generator to determine which embeddings should be synchronized.

Fair Resource Allocation

  • Solving Max-Min Fair Resource Allocations Quickly on Large Graphs [Paper] [Slides] [Code]
    • Microsoft & USC & Rice
    • Soroush: Single-Shot Max-Min Fair Allocator.
    • Deployed on Microsoft WAN.

Network Emulation

  • Crescent: Emulating Heterogeneous Production Network at Scale [Paper] [Slides]
    • ByteDance & Cornell
    • Crescent: ByteDance’s network emulation platform for preventing change-induced network incidents.

RDMA

  • Harmonic: Hardware-assisted RDMA Performance Isolation for Public Clouds [Paper]
    • UIUC & Duke & Microsoft
    • Harmonic: microarchitecture-resource-aware RDMA performance isolation; including a programmable intelligent PCIe switch (prototyped with FPGA) and an RDMA-friendly rate limiter.

PCIe

  • Understanding Routable PCIe Performance for Composable Infrastructures [Paper]
    • UW-Madison & ZJU
    • rPCIeBench: a software-hardware co-designed benchmarking framework to systematically characterize the routable PCIe fabric.