diff --git a/docs/source/quantization/supported_hardware.rst b/docs/source/quantization/supported_hardware.rst index bb41bfed342c6..ecc330d866dbd 100644 --- a/docs/source/quantization/supported_hardware.rst +++ b/docs/source/quantization/supported_hardware.rst @@ -5,20 +5,18 @@ Supported Hardware for Quantization Kernels The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM: -===================== ====== ======= ======= ===== ====== ======= ========= ======= ============== ========== -Implementation Volta Turing Ampere Ada Hopper AMD GPU Intel GPU x86 CPU AWS Inferentia Google TPU -===================== ====== ======= ======= ===== ====== ======= ========= ======= ============== ========== -AWQ ❌ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ -GPTQ ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ -Marlin (GPTQ/AWQ/FP8) ❌ ❌ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ -INT8 (W8A8) ❌ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ -FP8 (W8A8) ❌ ❌ ❌ ✅ ✅ ❌ ❌ ❌ ❌ ❌ -AQLM ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ -bitsandbytes ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ -DeepSpeedFP ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ -GGUF ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ -SqueezeLLM ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ -===================== ====== ======= ======= ===== ====== ======= ========= ======= ============== ========== +============== ====== ======= ======= ===== ====== ======= ========= ======= ============== ========== +Implementation Volta Turing Ampere Ada Hopper AMD GPU Intel GPU x86 CPU AWS Inferentia Google TPU +============== ====== ======= ======= ===== ====== ======= ========= ======= ============== ========== +AQLM ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ +AWQ ❌ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ +DeepSpeedFP ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ +FP8 ❌ ❌ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ +Marlin ❌ ❌ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ +GPTQ ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ +SqueezeLLM ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ +bitsandbytes ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌ +============== ====== ======= ======= ===== ====== ======= ========= ======= ============== ========== Notes: ^^^^^^ @@ -29,4 +27,4 @@ Notes: Please note that this compatibility chart may be subject to change as vLLM continues to evolve and expand its support for different hardware platforms and quantization methods. -For the most up-to-date information on hardware support and quantization methods, please check the `quantization directory `_ or consult with the vLLM development team. +For the most up-to-date information on hardware support and quantization methods, please check the `quantization directory `_ or consult with the vLLM development team. \ No newline at end of file