LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3
-
Updated
Dec 20, 2024 - Python