Homepage: https://icml.cc/Conferences/2022
Paper List: https://icml.cc/virtual/2023/papers.html?filter=titles
- Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time [Paper]
- FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU [Personal Notes] [Paper]