https://arxiv.org/abs/2305.09781
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification (Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Rae Ying Yee Wong, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia)