From a912d239c2d0e2e5d10bb8d02e6f5685edb43e68 Mon Sep 17 00:00:00 2001 From: "Xingjun.Wang" Date: Wed, 6 Nov 2024 19:00:24 +0800 Subject: [PATCH] Rebuild docs for speed benchmark (#1045) * add qwen2.5 perf report * update readme * rebuild docs and fix format issue * remove fuzzy in speed_benchmark.po * fix issue * recover function_call.po * update * remove unused code in speed_benchmark.po --- README.md | 6 +- .../LC_MESSAGES/benchmark/speed_benchmark.po | 3525 +++++++++++------ docs/source/benchmark/speed_benchmark.rst | 526 +-- 3 files changed, 2534 insertions(+), 1523 deletions(-) diff --git a/README.md b/README.md index 4181bb0..7f6e35c 100644 --- a/README.md +++ b/README.md @@ -22,7 +22,7 @@ To learn more about Qwen2.5, feel free to read our documentation \[[EN](https:// - Quantization: the practice of quantizing LLMs with GPTQ, AWQ, as well as the guidance for how to make high-quality quantized GGUF files; - Training: the instructions for post-training, including SFT and RLHF (TODO) with frameworks like Axolotl, LLaMA-Factory, etc. - Framework: the usage of Qwen with frameworks for application, e.g., RAG, Agent, etc. -- Benchmark: the statistics about inference speed and memory footprint (to be updated for Qwen2.5). +- Benchmark: the statistics about inference speed and memory footprint (Available for Qwen2.5). ## Introduction @@ -37,7 +37,7 @@ In the past three months since Qwen2's release, numerous developers have built n ## News -- 2024.09.19: We released the Qwen2.5 series. This time there are 3 extra model sizes: 3B, 14B, and 32B for more possibilities. Check our [blog](https://qwenlm.github.io/blog/qwen2.5) for more! +- 2024.09.19: We released the Qwen2.5 series. This time there are 3 extra model sizes: 3B, 14B, and 32B for more possibilities. Check our [blog](https://qwenlm.github.io/blog/qwen2.5) for more! - 2024.06.06: We released the Qwen2 series. Check our [blog](https://qwenlm.github.io/blog/qwen2/)! - 2024.03.28: We released the first MoE model of Qwen: Qwen1.5-MoE-A2.7B! Temporarily, only HF transformers and vLLM support the model. We will soon add the support of llama.cpp, mlx-lm, etc. Check our [blog](https://qwenlm.github.io/blog/qwen-moe/) for more information! - 2024.02.05: We released the Qwen1.5 series. @@ -46,7 +46,7 @@ In the past three months since Qwen2's release, numerous developers have built n Detailed evaluation results are reported in this 📑 blog. -For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html) (to be updated for Qwen2.5). +For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html) . ## Quickstart diff --git a/docs/locales/zh_CN/LC_MESSAGES/benchmark/speed_benchmark.po b/docs/locales/zh_CN/LC_MESSAGES/benchmark/speed_benchmark.po index c894492..e1424ea 100644 --- a/docs/locales/zh_CN/LC_MESSAGES/benchmark/speed_benchmark.po +++ b/docs/locales/zh_CN/LC_MESSAGES/benchmark/speed_benchmark.po @@ -7,7 +7,7 @@ msgid "" msgstr "" "Project-Id-Version: Qwen \n" "Report-Msgid-Bugs-To: \n" -"POT-Creation-Date: 2024-09-18 21:18+0800\n" +"POT-Creation-Date: 2024-10-31 15:54+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language: zh_CN\n" @@ -16,219 +16,244 @@ msgstr "" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=utf-8\n" "Content-Transfer-Encoding: 8bit\n" -"Generated-By: Babel 2.15.0\n" +"Generated-By: Babel 2.16.0\n" #: ../../source/benchmark/speed_benchmark.rst:2 -#: 96f9c969f82049efbaf7b70525976649 -msgid "Speed Benchmark" +#: c37062a883c842a2b89fc3971b2209cb +msgid "Qwen2.5 Speed Benchmark" msgstr "效率评估" #: ../../source/benchmark/speed_benchmark.rst:5 -#: 3e97857c19314350b1d6686ad9776d35 -msgid "To be updated for Qwen2.5." -msgstr "Qwen2.5结果待更新,由于模型结构差异有限,Qwen2结果可供参考。" +#: 5577386104e04ce0820d75b8d4a4b9bb +msgid "This section reports the speed performance of bf16 models, quantized models (including GPTQ-Int4, GPTQ-Int8 and AWQ) of the Qwen2.5 series. Specifically, we report the inference speed (tokens/s) as well as memory footprint (GB) under the conditions of different context lengths." +msgstr "本部分介绍Qwen2.5系列模型(原始模型和量化模型)的效率测试结果,包括推理速度(tokens/s)与不同上下文长度时的显存占用(GB)。" -#: ../../source/benchmark/speed_benchmark.rst:7 -#: 4f0e196db456466997765e4b93b873be -msgid "This section reports the speed performance of bf16 models, quantized models (including GPTQ-Int4, GPTQ-Int8 and AWQ) of the Qwen2 series. Specifically, we report the inference speed (tokens/s) as well as memory footprint (GB) under the conditions of different context lengths." -msgstr "本部分介绍Qwen2模型(原始模型和量化模型)的效率测试结果,包括推理速度(tokens/s)与不同上下文长度时的显存占用(GB)。" - -#: ../../source/benchmark/speed_benchmark.rst:12 -#: d3a3a79f4010466f882bd52955780253 +#: ../../source/benchmark/speed_benchmark.rst:10 +#: 9edf3184b2694e6d9dee05c519bea1ae msgid "The environment of the evaluation with huggingface transformers is:" msgstr "测试HuggingFace ``transformers`` 时的环境配置:" -#: ../../source/benchmark/speed_benchmark.rst:14 -#: ../../source/benchmark/speed_benchmark.rst:24 -#: 8e1e5f8b79c54381b4bf00c8637954c8 +#: ../../source/benchmark/speed_benchmark.rst:12 +#: ../../source/benchmark/speed_benchmark.rst:23 +#: 5929629b0bf143ab983efd4e2aa964c8 b619da3afa86420ba7e2583d9a5e7c39 msgid "NVIDIA A100 80GB" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:15 -#: ../../source/benchmark/speed_benchmark.rst:25 -#: 79bb2a6aea064df79c0819b9c966b867 -msgid "CUDA 11.8" +#: ../../source/benchmark/speed_benchmark.rst:13 +#: ../../source/benchmark/speed_benchmark.rst:24 +#: 6986d9f22df54554a9e830b3828a5ed2 a4e87ae3bd2042429b8df23c779f6373 +msgid "CUDA 12.1" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:16 -#: 6ed8b5fb474842c18b4e319eebbbb73f -msgid "Pytorch 2.1.2+cu118" +#: ../../source/benchmark/speed_benchmark.rst:14 +#: 190e255dcd1e469294188508b49bf98c +msgid "Pytorch 2.3.1" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:17 -#: e6429404fdc543e1a80c811b9ef32e2a -msgid "Flash Attention 2.3.3" +#: ../../source/benchmark/speed_benchmark.rst:15 +#: c693c6e715074b2daa95c62064b4e79e +msgid "Flash Attention 2.5.8" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:18 -#: a02f0bd8337949288a07caa4704aa55a -msgid "Transformers 4.38.2" +#: ../../source/benchmark/speed_benchmark.rst:16 +#: ../../source/benchmark/speed_benchmark.rst:28 +#: 3796f99ed359444da30190e7a3b86428 bfc7d82414fa46a58a09553f4c703af6 +msgid "Transformers 4.46.0" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:19 -#: 7072898a11164f7ca15acca7edaca4f9 -msgid "AutoGPTQ 0.7.1" +#: ../../source/benchmark/speed_benchmark.rst:17 +#: 427aa447657849cba460032041380f2e +msgid "AutoGPTQ 0.7.1+cu121 (Compiled from source code)" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:20 -#: 76bdca0175824567908b4cbc83c02731 -msgid "AutoAWQ 0.2.4" +#: ../../source/benchmark/speed_benchmark.rst:18 +#: aabddb4e2b0244ea9c27788ce453f30e +msgid "AutoAWQ 0.2.6" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:22 -#: 568b1b0c821d4af199a3d6122f38d1ea +#: ../../source/benchmark/speed_benchmark.rst:21 +#: 8f4e975fbc9d48f18cb30d75a9f335db msgid "The environment of the evaluation with vLLM is:" msgstr "测试vLLM时的环境配置:" -#: ../../source/benchmark/speed_benchmark.rst:26 -#: 73515f5745e148cc8ddf1e1ae1c9da3b -msgid "Pytorch 2.3.0+cu118" -msgstr "" - -#: ../../source/benchmark/speed_benchmark.rst:27 -#: 3a291f04fa1f4c86b646c28625f36868 -msgid "Flash Attention 2.5.6" +#: ../../source/benchmark/speed_benchmark.rst:25 +#: 4fd3b3a5e61747f6b1577593d144efe0 +msgid "vLLM 0.6.3" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:28 -#: 0c0ef00c714a43d3a34f3084b4198415 -msgid "Transformers 4.40.1" +#: ../../source/benchmark/speed_benchmark.rst:26 +#: a5ed9f5e4c164ddaa94443aaf9fad845 +msgid "Pytorch 2.4.0" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:29 -#: 9f06eca79d96433e8aa3bc7ddc2bd2f0 -msgid "vLLM 0.4.2" +#: ../../source/benchmark/speed_benchmark.rst:27 +#: c10ed8e23a3f4876908373715e50d88b +msgid "Flash Attention 2.6.3" msgstr "" #: ../../source/benchmark/speed_benchmark.rst:31 -#: 7347c156175b4d91b0257d11781cbef3 -msgid "Note:" +#: 408f8720e08641238a86c8976c54b69f +msgid "Notes:" msgstr "注意:" #: ../../source/benchmark/speed_benchmark.rst:33 -#: aa77ff662e564d6dbf92c329438dc9c8 -msgid "We use the batch size of 1 and the least number of GPUs as possible for the evalution." +#: 721fca542cbe44dca41c5209a83b2df7 +msgid "We use the batch size of 1 and the least number of GPUs as possible for the evaluation." msgstr "batch size 设置为1,使用 GPU 数量尽可能少" #: ../../source/benchmark/speed_benchmark.rst:35 -#: 012d76e949c04800b07ec12b30179c15 -msgid "We test the speed and memory of generating 2048 tokens with the input lengths of 1, 6144, 14336, 30720, 63488, and 129024 tokens (\\>32k is only avaliable for Qwen2-72B-Instuct and Qwen2-7B-Instuct)." +#: 974fc88f26354dce8f66862962a6a420 +msgid "We test the speed and memory of generating 2048 tokens with the input lengths of 1, 6144, 14336, 30720, 63488, and 129024 tokens." msgstr "我们测试生成2048 tokens时的速度与显存占用,输入长度分别为1、6144、14336、30720、63488、129024 tokens。(超过32K长度仅有 Qwen2-72B-Instuct 与 Qwen2-7B-Instuct 支持)" #: ../../source/benchmark/speed_benchmark.rst:38 -#: 9c656be43bf3416988786e8e97236550 +#: cfe1df792a90474983a34723522f5550 msgid "For vLLM, the memory usage is not reported because it pre-allocates all GPU memory. We use ``gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False`` by default." msgstr "对于vLLM,由于GPU显存预分配,实际显存使用难以评估。默认情况下,统一设定为``gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False``。" -#: ../../source/benchmark/speed_benchmark.rst:43 -#: 17ae27b5cd9a48b99665eb630c771d80 +#: ../../source/benchmark/speed_benchmark.rst:44 +#: e81c536e4ad641709eb6d3af109a5464 msgid "0.5B (Transformer)" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:46 -#: ../../source/benchmark/speed_benchmark.rst:84 -#: ../../source/benchmark/speed_benchmark.rst:123 -#: ../../source/benchmark/speed_benchmark.rst:161 -#: ../../source/benchmark/speed_benchmark.rst:200 -#: ../../source/benchmark/speed_benchmark.rst:239 -#: ../../source/benchmark/speed_benchmark.rst:294 -#: ../../source/benchmark/speed_benchmark.rst:316 -#: ../../source/benchmark/speed_benchmark.rst:330 -#: ../../source/benchmark/speed_benchmark.rst:348 -#: ../../source/benchmark/speed_benchmark.rst:387 -#: b729c12bd207442c984700a48bfdb3ff +#: ../../source/benchmark/speed_benchmark.rst:47 +#: ../../source/benchmark/speed_benchmark.rst:86 +#: ../../source/benchmark/speed_benchmark.rst:126 +#: ../../source/benchmark/speed_benchmark.rst:165 +#: ../../source/benchmark/speed_benchmark.rst:205 +#: ../../source/benchmark/speed_benchmark.rst:244 +#: ../../source/benchmark/speed_benchmark.rst:284 +#: ../../source/benchmark/speed_benchmark.rst:324 +#: ../../source/benchmark/speed_benchmark.rst:381 +#: ../../source/benchmark/speed_benchmark.rst:420 +#: ../../source/benchmark/speed_benchmark.rst:479 +#: ../../source/benchmark/speed_benchmark.rst:521 +#: ../../source/benchmark/speed_benchmark.rst:583 +#: ../../source/benchmark/speed_benchmark.rst:624 +#: 29a713e52e01489885933e2a60b8900a b535d72a52684e25b72c546ec96397a1 msgid "Model" msgstr "模型" -#: ../../source/benchmark/speed_benchmark.rst:46 -#: ../../source/benchmark/speed_benchmark.rst:84 -#: ../../source/benchmark/speed_benchmark.rst:123 -#: ../../source/benchmark/speed_benchmark.rst:161 -#: ../../source/benchmark/speed_benchmark.rst:200 -#: ../../source/benchmark/speed_benchmark.rst:239 -#: ../../source/benchmark/speed_benchmark.rst:294 -#: ../../source/benchmark/speed_benchmark.rst:316 -#: ../../source/benchmark/speed_benchmark.rst:348 -#: ../../source/benchmark/speed_benchmark.rst:387 -#: a8779c4e04c74dd0a24836465b57794e +#: ../../source/benchmark/speed_benchmark.rst:47 +#: ../../source/benchmark/speed_benchmark.rst:86 +#: ../../source/benchmark/speed_benchmark.rst:126 +#: ../../source/benchmark/speed_benchmark.rst:165 +#: ../../source/benchmark/speed_benchmark.rst:205 +#: ../../source/benchmark/speed_benchmark.rst:244 +#: ../../source/benchmark/speed_benchmark.rst:284 +#: ../../source/benchmark/speed_benchmark.rst:324 +#: ../../source/benchmark/speed_benchmark.rst:381 +#: ../../source/benchmark/speed_benchmark.rst:420 +#: ../../source/benchmark/speed_benchmark.rst:479 +#: ../../source/benchmark/speed_benchmark.rst:521 +#: ../../source/benchmark/speed_benchmark.rst:583 +#: ../../source/benchmark/speed_benchmark.rst:624 +#: 4b84c313f4b2499eafa9ea8bd982851a c8ec43dd253e4cb9a87c537e369a6133 msgid "Input Length" msgstr "输入长度" -#: ../../source/benchmark/speed_benchmark.rst:46 -#: ../../source/benchmark/speed_benchmark.rst:84 -#: ../../source/benchmark/speed_benchmark.rst:123 -#: ../../source/benchmark/speed_benchmark.rst:161 -#: ../../source/benchmark/speed_benchmark.rst:200 -#: ../../source/benchmark/speed_benchmark.rst:239 -#: ../../source/benchmark/speed_benchmark.rst:294 -#: ../../source/benchmark/speed_benchmark.rst:316 -#: ../../source/benchmark/speed_benchmark.rst:330 -#: ../../source/benchmark/speed_benchmark.rst:348 -#: ../../source/benchmark/speed_benchmark.rst:387 -#: 2543cd648b094e00990b63d343b882e8 +#: ../../source/benchmark/speed_benchmark.rst:47 +#: ../../source/benchmark/speed_benchmark.rst:86 +#: ../../source/benchmark/speed_benchmark.rst:126 +#: ../../source/benchmark/speed_benchmark.rst:165 +#: ../../source/benchmark/speed_benchmark.rst:205 +#: ../../source/benchmark/speed_benchmark.rst:244 +#: ../../source/benchmark/speed_benchmark.rst:284 +#: ../../source/benchmark/speed_benchmark.rst:324 +#: ../../source/benchmark/speed_benchmark.rst:381 +#: ../../source/benchmark/speed_benchmark.rst:420 +#: ../../source/benchmark/speed_benchmark.rst:479 +#: ../../source/benchmark/speed_benchmark.rst:521 +#: ../../source/benchmark/speed_benchmark.rst:583 +#: ../../source/benchmark/speed_benchmark.rst:624 +#: 80d85daadc3b41a091ccc3d16622d3dc 8383415081a248cfbc6468a46ec446a7 msgid "Quantization" msgstr "量化" -#: ../../source/benchmark/speed_benchmark.rst:46 -#: ../../source/benchmark/speed_benchmark.rst:84 -#: ../../source/benchmark/speed_benchmark.rst:123 -#: ../../source/benchmark/speed_benchmark.rst:161 -#: ../../source/benchmark/speed_benchmark.rst:200 -#: ../../source/benchmark/speed_benchmark.rst:239 -#: ../../source/benchmark/speed_benchmark.rst:294 -#: ../../source/benchmark/speed_benchmark.rst:316 -#: ../../source/benchmark/speed_benchmark.rst:348 -#: ../../source/benchmark/speed_benchmark.rst:387 -#: fe3ada130b30408e8e4736a55b0f8b9c +#: ../../source/benchmark/speed_benchmark.rst:47 +#: ../../source/benchmark/speed_benchmark.rst:86 +#: ../../source/benchmark/speed_benchmark.rst:126 +#: ../../source/benchmark/speed_benchmark.rst:165 +#: ../../source/benchmark/speed_benchmark.rst:205 +#: ../../source/benchmark/speed_benchmark.rst:244 +#: ../../source/benchmark/speed_benchmark.rst:284 +#: ../../source/benchmark/speed_benchmark.rst:324 +#: ../../source/benchmark/speed_benchmark.rst:381 +#: ../../source/benchmark/speed_benchmark.rst:420 +#: ../../source/benchmark/speed_benchmark.rst:479 +#: ../../source/benchmark/speed_benchmark.rst:521 +#: ../../source/benchmark/speed_benchmark.rst:583 +#: ../../source/benchmark/speed_benchmark.rst:624 +#: 72044b25841e4096a9f616a9dad358b5 83e8c98032fd445b8eec1980b8dc0967 msgid "GPU Num" msgstr "GPU数量" -#: ../../source/benchmark/speed_benchmark.rst:46 -#: ../../source/benchmark/speed_benchmark.rst:84 -#: ../../source/benchmark/speed_benchmark.rst:123 -#: ../../source/benchmark/speed_benchmark.rst:161 -#: ../../source/benchmark/speed_benchmark.rst:200 -#: ../../source/benchmark/speed_benchmark.rst:239 -#: ../../source/benchmark/speed_benchmark.rst:294 -#: ../../source/benchmark/speed_benchmark.rst:316 -#: ../../source/benchmark/speed_benchmark.rst:348 -#: ../../source/benchmark/speed_benchmark.rst:387 -#: 6ab3aea03cb44388b6bd7ee3a1d7684c +#: ../../source/benchmark/speed_benchmark.rst:47 +#: ../../source/benchmark/speed_benchmark.rst:86 +#: ../../source/benchmark/speed_benchmark.rst:126 +#: ../../source/benchmark/speed_benchmark.rst:165 +#: ../../source/benchmark/speed_benchmark.rst:205 +#: ../../source/benchmark/speed_benchmark.rst:244 +#: ../../source/benchmark/speed_benchmark.rst:284 +#: ../../source/benchmark/speed_benchmark.rst:324 +#: ../../source/benchmark/speed_benchmark.rst:381 +#: ../../source/benchmark/speed_benchmark.rst:420 +#: ../../source/benchmark/speed_benchmark.rst:479 +#: ../../source/benchmark/speed_benchmark.rst:521 +#: ../../source/benchmark/speed_benchmark.rst:583 +#: ../../source/benchmark/speed_benchmark.rst:624 +#: 009e83260f8f47c0a059732347b8fd99 b088a9cd762f4f66ac803b136592d804 msgid "Speed(tokens/s)" msgstr "速度 (tokens/s)" -#: ../../source/benchmark/speed_benchmark.rst:46 -#: ../../source/benchmark/speed_benchmark.rst:123 -#: ../../source/benchmark/speed_benchmark.rst:200 -#: ../../source/benchmark/speed_benchmark.rst:294 -#: ../../source/benchmark/speed_benchmark.rst:348 -#: 163d63199ed74a65b096283cf0a6b3df +#: ../../source/benchmark/speed_benchmark.rst:47 +#: ../../source/benchmark/speed_benchmark.rst:126 +#: ../../source/benchmark/speed_benchmark.rst:205 +#: ../../source/benchmark/speed_benchmark.rst:284 +#: ../../source/benchmark/speed_benchmark.rst:381 +#: ../../source/benchmark/speed_benchmark.rst:479 +#: ../../source/benchmark/speed_benchmark.rst:583 +#: d303cdcb58e2427e8d4302c7ff31e554 e51ecc8c63724c2f8cd13dd5e4f9c145 msgid "GPU Memory(GB)" msgstr "显存占用 (GB)" -#: ../../source/benchmark/speed_benchmark.rst:48 -#: ../../source/benchmark/speed_benchmark.rst:86 -#: d8e54251b281451891a5332d2d1919c2 -msgid "Qwen2-0.5B-Instruct" -msgstr "" - -#: ../../source/benchmark/speed_benchmark.rst:48 -#: ../../source/benchmark/speed_benchmark.rst:50 -#: ../../source/benchmark/speed_benchmark.rst:52 -#: ../../source/benchmark/speed_benchmark.rst:54 -#: ../../source/benchmark/speed_benchmark.rst:56 -#: ../../source/benchmark/speed_benchmark.rst:58 -#: ../../source/benchmark/speed_benchmark.rst:60 -#: ../../source/benchmark/speed_benchmark.rst:62 -#: ../../source/benchmark/speed_benchmark.rst:64 -#: ../../source/benchmark/speed_benchmark.rst:66 -#: ../../source/benchmark/speed_benchmark.rst:68 -#: ../../source/benchmark/speed_benchmark.rst:70 -#: ../../source/benchmark/speed_benchmark.rst:72 -#: ../../source/benchmark/speed_benchmark.rst:74 -#: ../../source/benchmark/speed_benchmark.rst:76 -#: ../../source/benchmark/speed_benchmark.rst:78 -#: ../../source/benchmark/speed_benchmark.rst:86 +#: ../../source/benchmark/speed_benchmark.rst:47 +#: ../../source/benchmark/speed_benchmark.rst:126 +#: ../../source/benchmark/speed_benchmark.rst:205 +#: ../../source/benchmark/speed_benchmark.rst:284 +#: ../../source/benchmark/speed_benchmark.rst:324 +#: ../../source/benchmark/speed_benchmark.rst:381 +#: ../../source/benchmark/speed_benchmark.rst:420 +#: ../../source/benchmark/speed_benchmark.rst:479 +#: ../../source/benchmark/speed_benchmark.rst:521 +#: ../../source/benchmark/speed_benchmark.rst:583 +#: ../../source/benchmark/speed_benchmark.rst:624 +#: 974200ffda5f454cad058306d29c01f5 +msgid "Note" +msgstr "注意:" + +#: ../../source/benchmark/speed_benchmark.rst:49 +#: ../../source/benchmark/speed_benchmark.rst:88 +#: 963269fc133c4b58a0b272708a3cd91e +msgid "Qwen2.5-0.5B-Instruct" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:49 +#: ../../source/benchmark/speed_benchmark.rst:51 +#: ../../source/benchmark/speed_benchmark.rst:53 +#: ../../source/benchmark/speed_benchmark.rst:55 +#: ../../source/benchmark/speed_benchmark.rst:57 +#: ../../source/benchmark/speed_benchmark.rst:59 +#: ../../source/benchmark/speed_benchmark.rst:61 +#: ../../source/benchmark/speed_benchmark.rst:63 +#: ../../source/benchmark/speed_benchmark.rst:65 +#: ../../source/benchmark/speed_benchmark.rst:67 +#: ../../source/benchmark/speed_benchmark.rst:69 +#: ../../source/benchmark/speed_benchmark.rst:71 +#: ../../source/benchmark/speed_benchmark.rst:73 +#: ../../source/benchmark/speed_benchmark.rst:75 +#: ../../source/benchmark/speed_benchmark.rst:77 +#: ../../source/benchmark/speed_benchmark.rst:79 #: ../../source/benchmark/speed_benchmark.rst:88 #: ../../source/benchmark/speed_benchmark.rst:90 #: ../../source/benchmark/speed_benchmark.rst:92 @@ -244,24 +269,23 @@ msgstr "" #: ../../source/benchmark/speed_benchmark.rst:112 #: ../../source/benchmark/speed_benchmark.rst:114 #: ../../source/benchmark/speed_benchmark.rst:116 -#: ../../source/benchmark/speed_benchmark.rst:125 -#: ../../source/benchmark/speed_benchmark.rst:127 -#: ../../source/benchmark/speed_benchmark.rst:129 -#: ../../source/benchmark/speed_benchmark.rst:131 -#: ../../source/benchmark/speed_benchmark.rst:133 -#: ../../source/benchmark/speed_benchmark.rst:135 -#: ../../source/benchmark/speed_benchmark.rst:137 -#: ../../source/benchmark/speed_benchmark.rst:139 -#: ../../source/benchmark/speed_benchmark.rst:141 -#: ../../source/benchmark/speed_benchmark.rst:143 -#: ../../source/benchmark/speed_benchmark.rst:145 -#: ../../source/benchmark/speed_benchmark.rst:147 -#: ../../source/benchmark/speed_benchmark.rst:149 -#: ../../source/benchmark/speed_benchmark.rst:151 -#: ../../source/benchmark/speed_benchmark.rst:153 -#: ../../source/benchmark/speed_benchmark.rst:155 -#: ../../source/benchmark/speed_benchmark.rst:163 -#: ../../source/benchmark/speed_benchmark.rst:165 +#: ../../source/benchmark/speed_benchmark.rst:118 +#: ../../source/benchmark/speed_benchmark.rst:128 +#: ../../source/benchmark/speed_benchmark.rst:130 +#: ../../source/benchmark/speed_benchmark.rst:132 +#: ../../source/benchmark/speed_benchmark.rst:134 +#: ../../source/benchmark/speed_benchmark.rst:136 +#: ../../source/benchmark/speed_benchmark.rst:138 +#: ../../source/benchmark/speed_benchmark.rst:140 +#: ../../source/benchmark/speed_benchmark.rst:142 +#: ../../source/benchmark/speed_benchmark.rst:144 +#: ../../source/benchmark/speed_benchmark.rst:146 +#: ../../source/benchmark/speed_benchmark.rst:148 +#: ../../source/benchmark/speed_benchmark.rst:150 +#: ../../source/benchmark/speed_benchmark.rst:152 +#: ../../source/benchmark/speed_benchmark.rst:154 +#: ../../source/benchmark/speed_benchmark.rst:156 +#: ../../source/benchmark/speed_benchmark.rst:158 #: ../../source/benchmark/speed_benchmark.rst:167 #: ../../source/benchmark/speed_benchmark.rst:169 #: ../../source/benchmark/speed_benchmark.rst:171 @@ -276,1690 +300,2673 @@ msgstr "" #: ../../source/benchmark/speed_benchmark.rst:189 #: ../../source/benchmark/speed_benchmark.rst:191 #: ../../source/benchmark/speed_benchmark.rst:193 -#: ../../source/benchmark/speed_benchmark.rst:202 -#: ../../source/benchmark/speed_benchmark.rst:204 -#: ../../source/benchmark/speed_benchmark.rst:206 -#: ../../source/benchmark/speed_benchmark.rst:208 -#: ../../source/benchmark/speed_benchmark.rst:210 -#: ../../source/benchmark/speed_benchmark.rst:212 -#: ../../source/benchmark/speed_benchmark.rst:214 -#: ../../source/benchmark/speed_benchmark.rst:216 -#: ../../source/benchmark/speed_benchmark.rst:218 -#: ../../source/benchmark/speed_benchmark.rst:220 -#: ../../source/benchmark/speed_benchmark.rst:222 -#: ../../source/benchmark/speed_benchmark.rst:224 -#: ../../source/benchmark/speed_benchmark.rst:226 -#: ../../source/benchmark/speed_benchmark.rst:228 -#: ../../source/benchmark/speed_benchmark.rst:230 -#: ../../source/benchmark/speed_benchmark.rst:232 -#: ../../source/benchmark/speed_benchmark.rst:241 -#: ../../source/benchmark/speed_benchmark.rst:243 -#: ../../source/benchmark/speed_benchmark.rst:245 -#: ../../source/benchmark/speed_benchmark.rst:247 -#: ../../source/benchmark/speed_benchmark.rst:249 -#: ../../source/benchmark/speed_benchmark.rst:251 -#: ../../source/benchmark/speed_benchmark.rst:253 -#: ../../source/benchmark/speed_benchmark.rst:255 -#: ../../source/benchmark/speed_benchmark.rst:257 -#: ../../source/benchmark/speed_benchmark.rst:259 -#: ../../source/benchmark/speed_benchmark.rst:261 -#: ../../source/benchmark/speed_benchmark.rst:263 -#: ../../source/benchmark/speed_benchmark.rst:265 -#: ../../source/benchmark/speed_benchmark.rst:267 -#: ../../source/benchmark/speed_benchmark.rst:269 -#: ../../source/benchmark/speed_benchmark.rst:271 -#: ../../source/benchmark/speed_benchmark.rst:273 -#: ../../source/benchmark/speed_benchmark.rst:275 -#: ../../source/benchmark/speed_benchmark.rst:277 -#: ../../source/benchmark/speed_benchmark.rst:279 -#: ../../source/benchmark/speed_benchmark.rst:281 -#: ../../source/benchmark/speed_benchmark.rst:283 -#: ../../source/benchmark/speed_benchmark.rst:285 -#: ../../source/benchmark/speed_benchmark.rst:287 +#: ../../source/benchmark/speed_benchmark.rst:195 +#: ../../source/benchmark/speed_benchmark.rst:197 +#: ../../source/benchmark/speed_benchmark.rst:207 +#: ../../source/benchmark/speed_benchmark.rst:209 +#: ../../source/benchmark/speed_benchmark.rst:211 +#: ../../source/benchmark/speed_benchmark.rst:213 +#: ../../source/benchmark/speed_benchmark.rst:215 +#: ../../source/benchmark/speed_benchmark.rst:217 +#: ../../source/benchmark/speed_benchmark.rst:219 +#: ../../source/benchmark/speed_benchmark.rst:221 +#: ../../source/benchmark/speed_benchmark.rst:223 +#: ../../source/benchmark/speed_benchmark.rst:225 +#: ../../source/benchmark/speed_benchmark.rst:227 +#: ../../source/benchmark/speed_benchmark.rst:229 +#: ../../source/benchmark/speed_benchmark.rst:231 +#: ../../source/benchmark/speed_benchmark.rst:233 +#: ../../source/benchmark/speed_benchmark.rst:235 +#: ../../source/benchmark/speed_benchmark.rst:237 +#: ../../source/benchmark/speed_benchmark.rst:246 +#: ../../source/benchmark/speed_benchmark.rst:248 +#: ../../source/benchmark/speed_benchmark.rst:250 +#: ../../source/benchmark/speed_benchmark.rst:252 +#: ../../source/benchmark/speed_benchmark.rst:254 +#: ../../source/benchmark/speed_benchmark.rst:256 +#: ../../source/benchmark/speed_benchmark.rst:258 +#: ../../source/benchmark/speed_benchmark.rst:260 +#: ../../source/benchmark/speed_benchmark.rst:262 +#: ../../source/benchmark/speed_benchmark.rst:264 +#: ../../source/benchmark/speed_benchmark.rst:266 +#: ../../source/benchmark/speed_benchmark.rst:268 +#: ../../source/benchmark/speed_benchmark.rst:270 +#: ../../source/benchmark/speed_benchmark.rst:272 +#: ../../source/benchmark/speed_benchmark.rst:274 +#: ../../source/benchmark/speed_benchmark.rst:276 +#: ../../source/benchmark/speed_benchmark.rst:286 +#: ../../source/benchmark/speed_benchmark.rst:288 +#: ../../source/benchmark/speed_benchmark.rst:290 +#: ../../source/benchmark/speed_benchmark.rst:292 +#: ../../source/benchmark/speed_benchmark.rst:294 #: ../../source/benchmark/speed_benchmark.rst:296 #: ../../source/benchmark/speed_benchmark.rst:298 +#: ../../source/benchmark/speed_benchmark.rst:300 #: ../../source/benchmark/speed_benchmark.rst:302 +#: ../../source/benchmark/speed_benchmark.rst:304 #: ../../source/benchmark/speed_benchmark.rst:306 +#: ../../source/benchmark/speed_benchmark.rst:308 #: ../../source/benchmark/speed_benchmark.rst:310 -#: ../../source/benchmark/speed_benchmark.rst:318 +#: ../../source/benchmark/speed_benchmark.rst:312 +#: ../../source/benchmark/speed_benchmark.rst:314 +#: ../../source/benchmark/speed_benchmark.rst:316 +#: ../../source/benchmark/speed_benchmark.rst:326 +#: ../../source/benchmark/speed_benchmark.rst:328 +#: ../../source/benchmark/speed_benchmark.rst:330 +#: ../../source/benchmark/speed_benchmark.rst:332 +#: ../../source/benchmark/speed_benchmark.rst:334 +#: ../../source/benchmark/speed_benchmark.rst:336 +#: ../../source/benchmark/speed_benchmark.rst:338 +#: ../../source/benchmark/speed_benchmark.rst:340 +#: ../../source/benchmark/speed_benchmark.rst:342 +#: ../../source/benchmark/speed_benchmark.rst:344 +#: ../../source/benchmark/speed_benchmark.rst:346 +#: ../../source/benchmark/speed_benchmark.rst:348 #: ../../source/benchmark/speed_benchmark.rst:350 +#: ../../source/benchmark/speed_benchmark.rst:352 #: ../../source/benchmark/speed_benchmark.rst:354 #: ../../source/benchmark/speed_benchmark.rst:356 +#: ../../source/benchmark/speed_benchmark.rst:358 +#: ../../source/benchmark/speed_benchmark.rst:360 #: ../../source/benchmark/speed_benchmark.rst:362 #: ../../source/benchmark/speed_benchmark.rst:364 +#: ../../source/benchmark/speed_benchmark.rst:366 +#: ../../source/benchmark/speed_benchmark.rst:368 #: ../../source/benchmark/speed_benchmark.rst:370 #: ../../source/benchmark/speed_benchmark.rst:372 +#: ../../source/benchmark/speed_benchmark.rst:383 +#: ../../source/benchmark/speed_benchmark.rst:385 +#: ../../source/benchmark/speed_benchmark.rst:387 #: ../../source/benchmark/speed_benchmark.rst:389 +#: ../../source/benchmark/speed_benchmark.rst:391 +#: ../../source/benchmark/speed_benchmark.rst:393 #: ../../source/benchmark/speed_benchmark.rst:395 +#: ../../source/benchmark/speed_benchmark.rst:397 +#: ../../source/benchmark/speed_benchmark.rst:399 +#: ../../source/benchmark/speed_benchmark.rst:401 +#: ../../source/benchmark/speed_benchmark.rst:403 #: ../../source/benchmark/speed_benchmark.rst:405 -#: 088012662cc1481aa4119d7f6e097f51 +#: ../../source/benchmark/speed_benchmark.rst:407 +#: ../../source/benchmark/speed_benchmark.rst:409 +#: ../../source/benchmark/speed_benchmark.rst:411 +#: ../../source/benchmark/speed_benchmark.rst:413 +#: ../../source/benchmark/speed_benchmark.rst:422 +#: ../../source/benchmark/speed_benchmark.rst:424 +#: ../../source/benchmark/speed_benchmark.rst:426 +#: ../../source/benchmark/speed_benchmark.rst:428 +#: ../../source/benchmark/speed_benchmark.rst:430 +#: ../../source/benchmark/speed_benchmark.rst:432 +#: ../../source/benchmark/speed_benchmark.rst:434 +#: ../../source/benchmark/speed_benchmark.rst:436 +#: ../../source/benchmark/speed_benchmark.rst:438 +#: ../../source/benchmark/speed_benchmark.rst:440 +#: ../../source/benchmark/speed_benchmark.rst:442 +#: ../../source/benchmark/speed_benchmark.rst:444 +#: ../../source/benchmark/speed_benchmark.rst:446 +#: ../../source/benchmark/speed_benchmark.rst:448 +#: ../../source/benchmark/speed_benchmark.rst:450 +#: ../../source/benchmark/speed_benchmark.rst:452 +#: ../../source/benchmark/speed_benchmark.rst:454 +#: ../../source/benchmark/speed_benchmark.rst:456 +#: ../../source/benchmark/speed_benchmark.rst:458 +#: ../../source/benchmark/speed_benchmark.rst:460 +#: ../../source/benchmark/speed_benchmark.rst:462 +#: ../../source/benchmark/speed_benchmark.rst:464 +#: ../../source/benchmark/speed_benchmark.rst:466 +#: ../../source/benchmark/speed_benchmark.rst:468 +#: ../../source/benchmark/speed_benchmark.rst:481 +#: ../../source/benchmark/speed_benchmark.rst:483 +#: ../../source/benchmark/speed_benchmark.rst:485 +#: ../../source/benchmark/speed_benchmark.rst:487 +#: ../../source/benchmark/speed_benchmark.rst:489 +#: ../../source/benchmark/speed_benchmark.rst:491 +#: ../../source/benchmark/speed_benchmark.rst:493 +#: ../../source/benchmark/speed_benchmark.rst:495 +#: ../../source/benchmark/speed_benchmark.rst:497 +#: ../../source/benchmark/speed_benchmark.rst:499 +#: ../../source/benchmark/speed_benchmark.rst:501 +#: ../../source/benchmark/speed_benchmark.rst:503 +#: ../../source/benchmark/speed_benchmark.rst:505 +#: ../../source/benchmark/speed_benchmark.rst:507 +#: ../../source/benchmark/speed_benchmark.rst:509 +#: ../../source/benchmark/speed_benchmark.rst:511 +#: ../../source/benchmark/speed_benchmark.rst:523 +#: ../../source/benchmark/speed_benchmark.rst:525 +#: ../../source/benchmark/speed_benchmark.rst:527 +#: ../../source/benchmark/speed_benchmark.rst:529 +#: ../../source/benchmark/speed_benchmark.rst:531 +#: ../../source/benchmark/speed_benchmark.rst:533 +#: ../../source/benchmark/speed_benchmark.rst:535 +#: ../../source/benchmark/speed_benchmark.rst:537 +#: ../../source/benchmark/speed_benchmark.rst:539 +#: ../../source/benchmark/speed_benchmark.rst:541 +#: ../../source/benchmark/speed_benchmark.rst:543 +#: ../../source/benchmark/speed_benchmark.rst:545 +#: ../../source/benchmark/speed_benchmark.rst:549 +#: ../../source/benchmark/speed_benchmark.rst:551 +#: ../../source/benchmark/speed_benchmark.rst:553 +#: ../../source/benchmark/speed_benchmark.rst:557 +#: ../../source/benchmark/speed_benchmark.rst:559 +#: ../../source/benchmark/speed_benchmark.rst:561 +#: ../../source/benchmark/speed_benchmark.rst:565 +#: ../../source/benchmark/speed_benchmark.rst:567 +#: ../../source/benchmark/speed_benchmark.rst:569 +#: ../../source/benchmark/speed_benchmark.rst:585 +#: ../../source/benchmark/speed_benchmark.rst:589 +#: ../../source/benchmark/speed_benchmark.rst:591 +#: ../../source/benchmark/speed_benchmark.rst:597 +#: ../../source/benchmark/speed_benchmark.rst:599 +#: ../../source/benchmark/speed_benchmark.rst:605 +#: ../../source/benchmark/speed_benchmark.rst:607 +#: ../../source/benchmark/speed_benchmark.rst:626 +#: ../../source/benchmark/speed_benchmark.rst:632 +#: ../../source/benchmark/speed_benchmark.rst:642 +#: 11b2dcd21f6a43b1bfbbd5a7aaea3ec2 19c2f2475f7646b585545f47fb8cef25 +#: 2906110349e84105944420c86c6f8e14 2b0c156abdb6483b85ce4b3fec8e55ac +#: 2b285733cf4d4370a8e72f26624a3aaa 2c10691a557b4a39af14e0aa8e14eb37 +#: 3b25b912f2aa44bc9082fcb99e3dcc1b 3d36f48f498644f992fb5fb2d1c176ec +#: 4e0d4370b96a4ce5be99136b48b56b6a 4e68f9a0780c4d7f85f3ef1d4c23b263 +#: 6deca5be0ae6442ebad196e2b191ef54 7e20570ccc954dc483edf6522e157565 +#: 94a0dd46312f424c8d0333576d092543 99ddc7db2e254f0390d7611f90f0739b +#: d454f94031e040e484f9727a2881dcf7 e55ec1ef57274e9b915240ccea232523 +#: f2f53c0c8931410890c76db651671db5 f6e6bb504f0942279014ecdc0b28fbfc +#: fd0a186660b9468db1d816af63a72437 msgid "1" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:48 -#: ../../source/benchmark/speed_benchmark.rst:56 -#: ../../source/benchmark/speed_benchmark.rst:64 -#: ../../source/benchmark/speed_benchmark.rst:72 -#: ../../source/benchmark/speed_benchmark.rst:86 -#: ../../source/benchmark/speed_benchmark.rst:94 -#: ../../source/benchmark/speed_benchmark.rst:102 -#: ../../source/benchmark/speed_benchmark.rst:110 -#: ../../source/benchmark/speed_benchmark.rst:125 -#: ../../source/benchmark/speed_benchmark.rst:133 -#: ../../source/benchmark/speed_benchmark.rst:141 -#: ../../source/benchmark/speed_benchmark.rst:149 -#: ../../source/benchmark/speed_benchmark.rst:163 -#: ../../source/benchmark/speed_benchmark.rst:171 -#: ../../source/benchmark/speed_benchmark.rst:179 -#: ../../source/benchmark/speed_benchmark.rst:187 -#: ../../source/benchmark/speed_benchmark.rst:202 -#: ../../source/benchmark/speed_benchmark.rst:210 -#: ../../source/benchmark/speed_benchmark.rst:218 -#: ../../source/benchmark/speed_benchmark.rst:226 -#: ../../source/benchmark/speed_benchmark.rst:241 -#: ../../source/benchmark/speed_benchmark.rst:249 -#: ../../source/benchmark/speed_benchmark.rst:257 -#: ../../source/benchmark/speed_benchmark.rst:265 -#: ../../source/benchmark/speed_benchmark.rst:273 -#: ../../source/benchmark/speed_benchmark.rst:281 -#: ../../source/benchmark/speed_benchmark.rst:296 -#: ../../source/benchmark/speed_benchmark.rst:300 -#: ../../source/benchmark/speed_benchmark.rst:304 -#: ../../source/benchmark/speed_benchmark.rst:308 -#: ../../source/benchmark/speed_benchmark.rst:318 -#: ../../source/benchmark/speed_benchmark.rst:320 -#: ../../source/benchmark/speed_benchmark.rst:322 -#: ../../source/benchmark/speed_benchmark.rst:324 -#: ../../source/benchmark/speed_benchmark.rst:332 +#: ../../source/benchmark/speed_benchmark.rst:49 +#: ../../source/benchmark/speed_benchmark.rst:57 +#: ../../source/benchmark/speed_benchmark.rst:65 +#: ../../source/benchmark/speed_benchmark.rst:73 +#: ../../source/benchmark/speed_benchmark.rst:88 +#: ../../source/benchmark/speed_benchmark.rst:96 +#: ../../source/benchmark/speed_benchmark.rst:104 +#: ../../source/benchmark/speed_benchmark.rst:112 +#: ../../source/benchmark/speed_benchmark.rst:128 +#: ../../source/benchmark/speed_benchmark.rst:136 +#: ../../source/benchmark/speed_benchmark.rst:144 +#: ../../source/benchmark/speed_benchmark.rst:152 +#: ../../source/benchmark/speed_benchmark.rst:167 +#: ../../source/benchmark/speed_benchmark.rst:175 +#: ../../source/benchmark/speed_benchmark.rst:183 +#: ../../source/benchmark/speed_benchmark.rst:191 +#: ../../source/benchmark/speed_benchmark.rst:207 +#: ../../source/benchmark/speed_benchmark.rst:215 +#: ../../source/benchmark/speed_benchmark.rst:223 +#: ../../source/benchmark/speed_benchmark.rst:231 +#: ../../source/benchmark/speed_benchmark.rst:246 +#: ../../source/benchmark/speed_benchmark.rst:254 +#: ../../source/benchmark/speed_benchmark.rst:262 +#: ../../source/benchmark/speed_benchmark.rst:270 +#: ../../source/benchmark/speed_benchmark.rst:286 +#: ../../source/benchmark/speed_benchmark.rst:294 +#: ../../source/benchmark/speed_benchmark.rst:302 +#: ../../source/benchmark/speed_benchmark.rst:310 +#: ../../source/benchmark/speed_benchmark.rst:326 #: ../../source/benchmark/speed_benchmark.rst:334 -#: ../../source/benchmark/speed_benchmark.rst:336 -#: ../../source/benchmark/speed_benchmark.rst:338 +#: ../../source/benchmark/speed_benchmark.rst:342 #: ../../source/benchmark/speed_benchmark.rst:350 #: ../../source/benchmark/speed_benchmark.rst:358 #: ../../source/benchmark/speed_benchmark.rst:366 -#: ../../source/benchmark/speed_benchmark.rst:374 -#: ../../source/benchmark/speed_benchmark.rst:389 +#: ../../source/benchmark/speed_benchmark.rst:383 #: ../../source/benchmark/speed_benchmark.rst:391 -#: ../../source/benchmark/speed_benchmark.rst:401 -#: ../../source/benchmark/speed_benchmark.rst:411 -#: ../../source/benchmark/speed_benchmark.rst:419 -#: ../../source/benchmark/speed_benchmark.rst:427 -#: ../../source/benchmark/speed_benchmark.rst:435 -#: ../../source/benchmark/speed_benchmark.rst:443 -#: 005cece541024c82832f5c8b5f1887a5 +#: ../../source/benchmark/speed_benchmark.rst:399 +#: ../../source/benchmark/speed_benchmark.rst:407 +#: ../../source/benchmark/speed_benchmark.rst:422 +#: ../../source/benchmark/speed_benchmark.rst:430 +#: ../../source/benchmark/speed_benchmark.rst:438 +#: ../../source/benchmark/speed_benchmark.rst:446 +#: ../../source/benchmark/speed_benchmark.rst:454 +#: ../../source/benchmark/speed_benchmark.rst:462 +#: ../../source/benchmark/speed_benchmark.rst:481 +#: ../../source/benchmark/speed_benchmark.rst:489 +#: ../../source/benchmark/speed_benchmark.rst:497 +#: ../../source/benchmark/speed_benchmark.rst:505 +#: ../../source/benchmark/speed_benchmark.rst:523 +#: ../../source/benchmark/speed_benchmark.rst:531 +#: ../../source/benchmark/speed_benchmark.rst:539 +#: ../../source/benchmark/speed_benchmark.rst:547 +#: ../../source/benchmark/speed_benchmark.rst:555 +#: ../../source/benchmark/speed_benchmark.rst:563 +#: ../../source/benchmark/speed_benchmark.rst:585 +#: ../../source/benchmark/speed_benchmark.rst:593 +#: ../../source/benchmark/speed_benchmark.rst:601 +#: ../../source/benchmark/speed_benchmark.rst:609 +#: ../../source/benchmark/speed_benchmark.rst:626 +#: ../../source/benchmark/speed_benchmark.rst:628 +#: ../../source/benchmark/speed_benchmark.rst:638 +#: ../../source/benchmark/speed_benchmark.rst:648 +#: ../../source/benchmark/speed_benchmark.rst:656 +#: ../../source/benchmark/speed_benchmark.rst:664 +#: ../../source/benchmark/speed_benchmark.rst:672 +#: 80de1236199d47ee97b69d1059fc5ee1 msgid "BF16" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:48 -#: fa45d4fccc0d44c2800f972d2630a14f -msgid "49.94" +#: ../../source/benchmark/speed_benchmark.rst:49 +#: 0001a574c6a14ddc939dd37f515471c3 +msgid "47.40" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:48 -#: 4eb0e83451fc4f93a061d148039df0de -msgid "1.17" +#: ../../source/benchmark/speed_benchmark.rst:49 +#: 492b0964f7ca45a0a7a3eb5630fab76b +msgid "0.97" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:50 -#: ../../source/benchmark/speed_benchmark.rst:58 -#: ../../source/benchmark/speed_benchmark.rst:66 -#: ../../source/benchmark/speed_benchmark.rst:74 -#: ../../source/benchmark/speed_benchmark.rst:88 -#: ../../source/benchmark/speed_benchmark.rst:96 -#: ../../source/benchmark/speed_benchmark.rst:104 -#: ../../source/benchmark/speed_benchmark.rst:112 -#: ../../source/benchmark/speed_benchmark.rst:127 -#: ../../source/benchmark/speed_benchmark.rst:135 -#: ../../source/benchmark/speed_benchmark.rst:143 -#: ../../source/benchmark/speed_benchmark.rst:151 -#: ../../source/benchmark/speed_benchmark.rst:165 -#: ../../source/benchmark/speed_benchmark.rst:173 -#: ../../source/benchmark/speed_benchmark.rst:181 -#: ../../source/benchmark/speed_benchmark.rst:189 -#: ../../source/benchmark/speed_benchmark.rst:204 -#: ../../source/benchmark/speed_benchmark.rst:212 -#: ../../source/benchmark/speed_benchmark.rst:220 -#: ../../source/benchmark/speed_benchmark.rst:228 -#: ../../source/benchmark/speed_benchmark.rst:243 -#: ../../source/benchmark/speed_benchmark.rst:251 -#: ../../source/benchmark/speed_benchmark.rst:259 -#: ../../source/benchmark/speed_benchmark.rst:267 -#: ../../source/benchmark/speed_benchmark.rst:275 -#: ../../source/benchmark/speed_benchmark.rst:283 +#: ../../source/benchmark/speed_benchmark.rst:51 +#: ../../source/benchmark/speed_benchmark.rst:59 +#: ../../source/benchmark/speed_benchmark.rst:67 +#: ../../source/benchmark/speed_benchmark.rst:75 +#: ../../source/benchmark/speed_benchmark.rst:90 +#: ../../source/benchmark/speed_benchmark.rst:98 +#: ../../source/benchmark/speed_benchmark.rst:106 +#: ../../source/benchmark/speed_benchmark.rst:114 +#: ../../source/benchmark/speed_benchmark.rst:130 +#: ../../source/benchmark/speed_benchmark.rst:138 +#: ../../source/benchmark/speed_benchmark.rst:146 +#: ../../source/benchmark/speed_benchmark.rst:154 +#: ../../source/benchmark/speed_benchmark.rst:169 +#: ../../source/benchmark/speed_benchmark.rst:177 +#: ../../source/benchmark/speed_benchmark.rst:185 +#: ../../source/benchmark/speed_benchmark.rst:193 +#: ../../source/benchmark/speed_benchmark.rst:209 +#: ../../source/benchmark/speed_benchmark.rst:217 +#: ../../source/benchmark/speed_benchmark.rst:225 +#: ../../source/benchmark/speed_benchmark.rst:233 +#: ../../source/benchmark/speed_benchmark.rst:248 +#: ../../source/benchmark/speed_benchmark.rst:256 +#: ../../source/benchmark/speed_benchmark.rst:264 +#: ../../source/benchmark/speed_benchmark.rst:272 +#: ../../source/benchmark/speed_benchmark.rst:288 +#: ../../source/benchmark/speed_benchmark.rst:296 +#: ../../source/benchmark/speed_benchmark.rst:304 +#: ../../source/benchmark/speed_benchmark.rst:312 +#: ../../source/benchmark/speed_benchmark.rst:328 +#: ../../source/benchmark/speed_benchmark.rst:336 +#: ../../source/benchmark/speed_benchmark.rst:344 #: ../../source/benchmark/speed_benchmark.rst:352 #: ../../source/benchmark/speed_benchmark.rst:360 #: ../../source/benchmark/speed_benchmark.rst:368 -#: ../../source/benchmark/speed_benchmark.rst:376 +#: ../../source/benchmark/speed_benchmark.rst:385 #: ../../source/benchmark/speed_benchmark.rst:393 -#: ../../source/benchmark/speed_benchmark.rst:403 -#: ../../source/benchmark/speed_benchmark.rst:413 -#: ../../source/benchmark/speed_benchmark.rst:421 -#: ../../source/benchmark/speed_benchmark.rst:429 -#: ../../source/benchmark/speed_benchmark.rst:437 -#: ../../source/benchmark/speed_benchmark.rst:445 -#: 360e69153b484c13a7b341af2104245e +#: ../../source/benchmark/speed_benchmark.rst:401 +#: ../../source/benchmark/speed_benchmark.rst:409 +#: ../../source/benchmark/speed_benchmark.rst:424 +#: ../../source/benchmark/speed_benchmark.rst:432 +#: ../../source/benchmark/speed_benchmark.rst:440 +#: ../../source/benchmark/speed_benchmark.rst:448 +#: ../../source/benchmark/speed_benchmark.rst:456 +#: ../../source/benchmark/speed_benchmark.rst:464 +#: ../../source/benchmark/speed_benchmark.rst:483 +#: ../../source/benchmark/speed_benchmark.rst:491 +#: ../../source/benchmark/speed_benchmark.rst:499 +#: ../../source/benchmark/speed_benchmark.rst:507 +#: ../../source/benchmark/speed_benchmark.rst:525 +#: ../../source/benchmark/speed_benchmark.rst:533 +#: ../../source/benchmark/speed_benchmark.rst:541 +#: ../../source/benchmark/speed_benchmark.rst:549 +#: ../../source/benchmark/speed_benchmark.rst:557 +#: ../../source/benchmark/speed_benchmark.rst:565 +#: ../../source/benchmark/speed_benchmark.rst:587 +#: ../../source/benchmark/speed_benchmark.rst:595 +#: ../../source/benchmark/speed_benchmark.rst:603 +#: ../../source/benchmark/speed_benchmark.rst:611 +#: ../../source/benchmark/speed_benchmark.rst:630 +#: ../../source/benchmark/speed_benchmark.rst:640 +#: ../../source/benchmark/speed_benchmark.rst:650 +#: ../../source/benchmark/speed_benchmark.rst:658 +#: ../../source/benchmark/speed_benchmark.rst:666 +#: ../../source/benchmark/speed_benchmark.rst:674 +#: 87ed274bcbda417cb8853fb427e297be msgid "GPTQ-Int8" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:50 -#: 937ae93383b649c4a9676caa076e2dfd -msgid "36.35" +#: ../../source/benchmark/speed_benchmark.rst:51 +#: 0489903c73e345218d5addaedc224046 +msgid "35.17" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:50 -#: 54bc220ea7dd4876b029adefc06e51b8 -msgid "0.85" +#: ../../source/benchmark/speed_benchmark.rst:51 +#: 3f7d12b997c84246b6e1c2848982f9b6 +msgid "0.64" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:52 -#: ../../source/benchmark/speed_benchmark.rst:60 -#: ../../source/benchmark/speed_benchmark.rst:68 -#: ../../source/benchmark/speed_benchmark.rst:76 -#: ../../source/benchmark/speed_benchmark.rst:90 -#: ../../source/benchmark/speed_benchmark.rst:98 -#: ../../source/benchmark/speed_benchmark.rst:106 -#: ../../source/benchmark/speed_benchmark.rst:114 -#: ../../source/benchmark/speed_benchmark.rst:129 -#: ../../source/benchmark/speed_benchmark.rst:137 -#: ../../source/benchmark/speed_benchmark.rst:145 -#: ../../source/benchmark/speed_benchmark.rst:153 -#: ../../source/benchmark/speed_benchmark.rst:167 -#: ../../source/benchmark/speed_benchmark.rst:175 -#: ../../source/benchmark/speed_benchmark.rst:183 -#: ../../source/benchmark/speed_benchmark.rst:191 -#: ../../source/benchmark/speed_benchmark.rst:206 -#: ../../source/benchmark/speed_benchmark.rst:214 -#: ../../source/benchmark/speed_benchmark.rst:222 -#: ../../source/benchmark/speed_benchmark.rst:230 -#: ../../source/benchmark/speed_benchmark.rst:245 -#: ../../source/benchmark/speed_benchmark.rst:253 -#: ../../source/benchmark/speed_benchmark.rst:261 -#: ../../source/benchmark/speed_benchmark.rst:269 -#: ../../source/benchmark/speed_benchmark.rst:277 -#: ../../source/benchmark/speed_benchmark.rst:285 +#: ../../source/benchmark/speed_benchmark.rst:51 +#: ../../source/benchmark/speed_benchmark.rst:59 +#: ../../source/benchmark/speed_benchmark.rst:67 +#: ../../source/benchmark/speed_benchmark.rst:75 +#: ../../source/benchmark/speed_benchmark.rst:130 +#: ../../source/benchmark/speed_benchmark.rst:138 +#: ../../source/benchmark/speed_benchmark.rst:146 +#: ../../source/benchmark/speed_benchmark.rst:154 +#: ../../source/benchmark/speed_benchmark.rst:209 +#: ../../source/benchmark/speed_benchmark.rst:217 +#: ../../source/benchmark/speed_benchmark.rst:225 +#: ../../source/benchmark/speed_benchmark.rst:233 +#: ../../source/benchmark/speed_benchmark.rst:288 +#: ../../source/benchmark/speed_benchmark.rst:296 +#: ../../source/benchmark/speed_benchmark.rst:304 +#: ../../source/benchmark/speed_benchmark.rst:312 +#: ../../source/benchmark/speed_benchmark.rst:385 +#: ../../source/benchmark/speed_benchmark.rst:393 +#: ../../source/benchmark/speed_benchmark.rst:401 +#: ../../source/benchmark/speed_benchmark.rst:409 +#: ../../source/benchmark/speed_benchmark.rst:483 +#: ../../source/benchmark/speed_benchmark.rst:491 +#: ../../source/benchmark/speed_benchmark.rst:499 +#: ../../source/benchmark/speed_benchmark.rst:507 +#: ../../source/benchmark/speed_benchmark.rst:587 +#: ../../source/benchmark/speed_benchmark.rst:595 +#: ../../source/benchmark/speed_benchmark.rst:603 +#: ../../source/benchmark/speed_benchmark.rst:611 +#: 84525916476247c8b05c566ab47af083 +msgid "auto_gptq==0.6.0+cu1210" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:53 +#: ../../source/benchmark/speed_benchmark.rst:61 +#: ../../source/benchmark/speed_benchmark.rst:69 +#: ../../source/benchmark/speed_benchmark.rst:77 +#: ../../source/benchmark/speed_benchmark.rst:92 +#: ../../source/benchmark/speed_benchmark.rst:100 +#: ../../source/benchmark/speed_benchmark.rst:108 +#: ../../source/benchmark/speed_benchmark.rst:116 +#: ../../source/benchmark/speed_benchmark.rst:132 +#: ../../source/benchmark/speed_benchmark.rst:140 +#: ../../source/benchmark/speed_benchmark.rst:148 +#: ../../source/benchmark/speed_benchmark.rst:156 +#: ../../source/benchmark/speed_benchmark.rst:171 +#: ../../source/benchmark/speed_benchmark.rst:179 +#: ../../source/benchmark/speed_benchmark.rst:187 +#: ../../source/benchmark/speed_benchmark.rst:195 +#: ../../source/benchmark/speed_benchmark.rst:211 +#: ../../source/benchmark/speed_benchmark.rst:219 +#: ../../source/benchmark/speed_benchmark.rst:227 +#: ../../source/benchmark/speed_benchmark.rst:235 +#: ../../source/benchmark/speed_benchmark.rst:250 +#: ../../source/benchmark/speed_benchmark.rst:258 +#: ../../source/benchmark/speed_benchmark.rst:266 +#: ../../source/benchmark/speed_benchmark.rst:274 +#: ../../source/benchmark/speed_benchmark.rst:290 #: ../../source/benchmark/speed_benchmark.rst:298 -#: ../../source/benchmark/speed_benchmark.rst:302 #: ../../source/benchmark/speed_benchmark.rst:306 -#: ../../source/benchmark/speed_benchmark.rst:310 +#: ../../source/benchmark/speed_benchmark.rst:314 +#: ../../source/benchmark/speed_benchmark.rst:330 +#: ../../source/benchmark/speed_benchmark.rst:338 +#: ../../source/benchmark/speed_benchmark.rst:346 #: ../../source/benchmark/speed_benchmark.rst:354 #: ../../source/benchmark/speed_benchmark.rst:362 #: ../../source/benchmark/speed_benchmark.rst:370 -#: ../../source/benchmark/speed_benchmark.rst:378 +#: ../../source/benchmark/speed_benchmark.rst:387 #: ../../source/benchmark/speed_benchmark.rst:395 -#: ../../source/benchmark/speed_benchmark.rst:397 -#: ../../source/benchmark/speed_benchmark.rst:405 -#: ../../source/benchmark/speed_benchmark.rst:407 -#: ../../source/benchmark/speed_benchmark.rst:415 -#: ../../source/benchmark/speed_benchmark.rst:423 -#: ../../source/benchmark/speed_benchmark.rst:431 -#: ../../source/benchmark/speed_benchmark.rst:439 -#: ../../source/benchmark/speed_benchmark.rst:447 -#: 16599c94f7314c0c9605b2f3a4c69d8f +#: ../../source/benchmark/speed_benchmark.rst:403 +#: ../../source/benchmark/speed_benchmark.rst:411 +#: ../../source/benchmark/speed_benchmark.rst:426 +#: ../../source/benchmark/speed_benchmark.rst:434 +#: ../../source/benchmark/speed_benchmark.rst:442 +#: ../../source/benchmark/speed_benchmark.rst:450 +#: ../../source/benchmark/speed_benchmark.rst:458 +#: ../../source/benchmark/speed_benchmark.rst:466 +#: ../../source/benchmark/speed_benchmark.rst:485 +#: ../../source/benchmark/speed_benchmark.rst:493 +#: ../../source/benchmark/speed_benchmark.rst:501 +#: ../../source/benchmark/speed_benchmark.rst:509 +#: ../../source/benchmark/speed_benchmark.rst:527 +#: ../../source/benchmark/speed_benchmark.rst:535 +#: ../../source/benchmark/speed_benchmark.rst:543 +#: ../../source/benchmark/speed_benchmark.rst:551 +#: ../../source/benchmark/speed_benchmark.rst:559 +#: ../../source/benchmark/speed_benchmark.rst:567 +#: ../../source/benchmark/speed_benchmark.rst:589 +#: ../../source/benchmark/speed_benchmark.rst:597 +#: ../../source/benchmark/speed_benchmark.rst:605 +#: ../../source/benchmark/speed_benchmark.rst:613 +#: ../../source/benchmark/speed_benchmark.rst:632 +#: ../../source/benchmark/speed_benchmark.rst:634 +#: ../../source/benchmark/speed_benchmark.rst:642 +#: ../../source/benchmark/speed_benchmark.rst:644 +#: ../../source/benchmark/speed_benchmark.rst:652 +#: ../../source/benchmark/speed_benchmark.rst:660 +#: ../../source/benchmark/speed_benchmark.rst:668 +#: ../../source/benchmark/speed_benchmark.rst:676 +#: 012d772972364e4796bc37b10e53ec16 msgid "GPTQ-Int4" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:52 -#: ../../source/benchmark/speed_benchmark.rst:64 -#: 3986a4bb478443d6a8b6d2316691873a -msgid "49.56" +#: ../../source/benchmark/speed_benchmark.rst:53 +#: 266958f9dda14a7dbec9c28d9d53b19e +msgid "50.60" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:52 -#: ../../source/benchmark/speed_benchmark.rst:54 -#: 27651c408d4346628ed094125b3b9aa5 -msgid "0.68" +#: ../../source/benchmark/speed_benchmark.rst:53 +#: 80bc084e1d8d44a39e70fa8f97f0d455 +msgid "0.48" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:54 -#: ../../source/benchmark/speed_benchmark.rst:62 -#: ../../source/benchmark/speed_benchmark.rst:70 -#: ../../source/benchmark/speed_benchmark.rst:78 -#: ../../source/benchmark/speed_benchmark.rst:92 -#: ../../source/benchmark/speed_benchmark.rst:100 -#: ../../source/benchmark/speed_benchmark.rst:108 -#: ../../source/benchmark/speed_benchmark.rst:116 -#: ../../source/benchmark/speed_benchmark.rst:131 -#: ../../source/benchmark/speed_benchmark.rst:139 -#: ../../source/benchmark/speed_benchmark.rst:147 -#: ../../source/benchmark/speed_benchmark.rst:155 -#: ../../source/benchmark/speed_benchmark.rst:169 -#: ../../source/benchmark/speed_benchmark.rst:177 -#: ../../source/benchmark/speed_benchmark.rst:185 -#: ../../source/benchmark/speed_benchmark.rst:193 -#: ../../source/benchmark/speed_benchmark.rst:208 -#: ../../source/benchmark/speed_benchmark.rst:216 -#: ../../source/benchmark/speed_benchmark.rst:224 -#: ../../source/benchmark/speed_benchmark.rst:232 -#: ../../source/benchmark/speed_benchmark.rst:247 -#: ../../source/benchmark/speed_benchmark.rst:255 -#: ../../source/benchmark/speed_benchmark.rst:263 -#: ../../source/benchmark/speed_benchmark.rst:271 -#: ../../source/benchmark/speed_benchmark.rst:279 -#: ../../source/benchmark/speed_benchmark.rst:287 +#: ../../source/benchmark/speed_benchmark.rst:55 +#: ../../source/benchmark/speed_benchmark.rst:63 +#: ../../source/benchmark/speed_benchmark.rst:71 +#: ../../source/benchmark/speed_benchmark.rst:79 +#: ../../source/benchmark/speed_benchmark.rst:94 +#: ../../source/benchmark/speed_benchmark.rst:102 +#: ../../source/benchmark/speed_benchmark.rst:110 +#: ../../source/benchmark/speed_benchmark.rst:118 +#: ../../source/benchmark/speed_benchmark.rst:134 +#: ../../source/benchmark/speed_benchmark.rst:142 +#: ../../source/benchmark/speed_benchmark.rst:150 +#: ../../source/benchmark/speed_benchmark.rst:158 +#: ../../source/benchmark/speed_benchmark.rst:173 +#: ../../source/benchmark/speed_benchmark.rst:181 +#: ../../source/benchmark/speed_benchmark.rst:189 +#: ../../source/benchmark/speed_benchmark.rst:197 +#: ../../source/benchmark/speed_benchmark.rst:213 +#: ../../source/benchmark/speed_benchmark.rst:221 +#: ../../source/benchmark/speed_benchmark.rst:229 +#: ../../source/benchmark/speed_benchmark.rst:237 +#: ../../source/benchmark/speed_benchmark.rst:252 +#: ../../source/benchmark/speed_benchmark.rst:260 +#: ../../source/benchmark/speed_benchmark.rst:268 +#: ../../source/benchmark/speed_benchmark.rst:276 +#: ../../source/benchmark/speed_benchmark.rst:292 +#: ../../source/benchmark/speed_benchmark.rst:300 +#: ../../source/benchmark/speed_benchmark.rst:308 +#: ../../source/benchmark/speed_benchmark.rst:316 +#: ../../source/benchmark/speed_benchmark.rst:332 +#: ../../source/benchmark/speed_benchmark.rst:340 +#: ../../source/benchmark/speed_benchmark.rst:348 #: ../../source/benchmark/speed_benchmark.rst:356 #: ../../source/benchmark/speed_benchmark.rst:364 #: ../../source/benchmark/speed_benchmark.rst:372 -#: ../../source/benchmark/speed_benchmark.rst:380 -#: ../../source/benchmark/speed_benchmark.rst:399 -#: ../../source/benchmark/speed_benchmark.rst:409 -#: ../../source/benchmark/speed_benchmark.rst:417 -#: ../../source/benchmark/speed_benchmark.rst:425 -#: ../../source/benchmark/speed_benchmark.rst:433 -#: ../../source/benchmark/speed_benchmark.rst:441 -#: ../../source/benchmark/speed_benchmark.rst:449 -#: c04b5415b5a44246bbadbf885865c319 +#: ../../source/benchmark/speed_benchmark.rst:389 +#: ../../source/benchmark/speed_benchmark.rst:397 +#: ../../source/benchmark/speed_benchmark.rst:405 +#: ../../source/benchmark/speed_benchmark.rst:413 +#: ../../source/benchmark/speed_benchmark.rst:428 +#: ../../source/benchmark/speed_benchmark.rst:436 +#: ../../source/benchmark/speed_benchmark.rst:444 +#: ../../source/benchmark/speed_benchmark.rst:452 +#: ../../source/benchmark/speed_benchmark.rst:460 +#: ../../source/benchmark/speed_benchmark.rst:468 +#: ../../source/benchmark/speed_benchmark.rst:487 +#: ../../source/benchmark/speed_benchmark.rst:495 +#: ../../source/benchmark/speed_benchmark.rst:503 +#: ../../source/benchmark/speed_benchmark.rst:511 +#: ../../source/benchmark/speed_benchmark.rst:529 +#: ../../source/benchmark/speed_benchmark.rst:537 +#: ../../source/benchmark/speed_benchmark.rst:545 +#: ../../source/benchmark/speed_benchmark.rst:553 +#: ../../source/benchmark/speed_benchmark.rst:561 +#: ../../source/benchmark/speed_benchmark.rst:569 +#: ../../source/benchmark/speed_benchmark.rst:591 +#: ../../source/benchmark/speed_benchmark.rst:599 +#: ../../source/benchmark/speed_benchmark.rst:607 +#: ../../source/benchmark/speed_benchmark.rst:615 +#: ../../source/benchmark/speed_benchmark.rst:636 +#: ../../source/benchmark/speed_benchmark.rst:646 +#: ../../source/benchmark/speed_benchmark.rst:654 +#: ../../source/benchmark/speed_benchmark.rst:662 +#: ../../source/benchmark/speed_benchmark.rst:670 +#: ../../source/benchmark/speed_benchmark.rst:678 +#: b4f5d52a0a1f4d32ba489b80bc0cfcf5 msgid "AWQ" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:54 -#: c7854ac0df17462398f1f76a46b3ceb1 -msgid "38.78" +#: ../../source/benchmark/speed_benchmark.rst:55 +#: d7f6be27b48e4fa697ed1e1ad753b283 +msgid "37.09" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:56 -#: ../../source/benchmark/speed_benchmark.rst:94 -#: ../../source/benchmark/speed_benchmark.rst:133 -#: ../../source/benchmark/speed_benchmark.rst:171 -#: ../../source/benchmark/speed_benchmark.rst:210 -#: ../../source/benchmark/speed_benchmark.rst:249 -#: ../../source/benchmark/speed_benchmark.rst:300 -#: ../../source/benchmark/speed_benchmark.rst:320 -#: ../../source/benchmark/speed_benchmark.rst:358 -#: ../../source/benchmark/speed_benchmark.rst:401 -#: 5af45da3133a4d77889a92dd902810fc +#: ../../source/benchmark/speed_benchmark.rst:55 +#: b5a162217757412797b13717c75134ac +msgid "0.68" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:57 +#: ../../source/benchmark/speed_benchmark.rst:96 +#: ../../source/benchmark/speed_benchmark.rst:136 +#: ../../source/benchmark/speed_benchmark.rst:175 +#: ../../source/benchmark/speed_benchmark.rst:215 +#: ../../source/benchmark/speed_benchmark.rst:254 +#: ../../source/benchmark/speed_benchmark.rst:294 +#: ../../source/benchmark/speed_benchmark.rst:334 +#: ../../source/benchmark/speed_benchmark.rst:391 +#: ../../source/benchmark/speed_benchmark.rst:430 +#: ../../source/benchmark/speed_benchmark.rst:489 +#: ../../source/benchmark/speed_benchmark.rst:531 +#: ../../source/benchmark/speed_benchmark.rst:593 +#: ../../source/benchmark/speed_benchmark.rst:638 +#: 3f1d70cfb2044b51ab98de4f267d5531 msgid "6144" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:56 -#: fc1783a72ac548478ee1012f058540cf -msgid "50.83" +#: ../../source/benchmark/speed_benchmark.rst:57 +#: f835247e9829471d96d12b03a72ef8d2 +msgid "47.45" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:56 -#: 6234760eccdf45a8b46bb9b7f9934cc7 -msgid "6.42" +#: ../../source/benchmark/speed_benchmark.rst:57 +#: cf9b11d6eec34ef9ab7b2c4f18a20f30 +msgid "1.23" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:58 -#: 81e100aa291f4a69bdb8fbbfbd1c756c -msgid "36.56" +#: ../../source/benchmark/speed_benchmark.rst:59 +#: 6afd86f6e2514af08835e20cc299c4ae +msgid "36.47" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:58 -#: bd4eaeffae8b4b95b65a1510c0ec72af -msgid "6.09" +#: ../../source/benchmark/speed_benchmark.rst:59 +#: 84c346f0591246dd8ad8fb5df4baab2e +msgid "0.90" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:60 -#: 4d49282ef20744d683c57940b1ba28ae -msgid "49.63" +#: ../../source/benchmark/speed_benchmark.rst:61 +#: bf89dbb482e140a6a41bf7840ee48bcc +msgid "48.89" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:60 -#: ../../source/benchmark/speed_benchmark.rst:208 -#: ../../source/benchmark/speed_benchmark.rst:360 -#: e0a48a78d7ea46b3b2cffdaba0e9d5f9 -msgid "5.93" +#: ../../source/benchmark/speed_benchmark.rst:61 +#: d761d9a27829450f8dbfb0302994a349 +msgid "0.73" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:62 -#: 3b516aa67f8a4719bd2ea63babf07c15 -msgid "38.73" +#: ../../source/benchmark/speed_benchmark.rst:63 +#: 376d12a215874e979289fb21a1a17eb6 +msgid "37.04" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:62 -#: 2dba1cef974e4eceb3cb96338c962a41 -msgid "5.92" +#: ../../source/benchmark/speed_benchmark.rst:63 +#: 9988093e57b64c35a5caf93c456d28e2 +msgid "0.72" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:64 -#: ../../source/benchmark/speed_benchmark.rst:102 -#: ../../source/benchmark/speed_benchmark.rst:141 -#: ../../source/benchmark/speed_benchmark.rst:179 -#: ../../source/benchmark/speed_benchmark.rst:218 -#: ../../source/benchmark/speed_benchmark.rst:257 -#: ../../source/benchmark/speed_benchmark.rst:304 -#: ../../source/benchmark/speed_benchmark.rst:322 -#: ../../source/benchmark/speed_benchmark.rst:366 -#: ../../source/benchmark/speed_benchmark.rst:411 -#: 4d9a02ffd625473ea24f167b93b22e31 +#: ../../source/benchmark/speed_benchmark.rst:65 +#: ../../source/benchmark/speed_benchmark.rst:104 +#: ../../source/benchmark/speed_benchmark.rst:144 +#: ../../source/benchmark/speed_benchmark.rst:183 +#: ../../source/benchmark/speed_benchmark.rst:223 +#: ../../source/benchmark/speed_benchmark.rst:262 +#: ../../source/benchmark/speed_benchmark.rst:302 +#: ../../source/benchmark/speed_benchmark.rst:342 +#: ../../source/benchmark/speed_benchmark.rst:399 +#: ../../source/benchmark/speed_benchmark.rst:438 +#: ../../source/benchmark/speed_benchmark.rst:497 +#: ../../source/benchmark/speed_benchmark.rst:539 +#: ../../source/benchmark/speed_benchmark.rst:601 +#: ../../source/benchmark/speed_benchmark.rst:648 +#: f34f6a89b0c5484187c76a2c09633eaa msgid "14336" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:64 -#: 9469dd8432ef400ca83ec82378ed742c -msgid "13.48" +#: ../../source/benchmark/speed_benchmark.rst:65 +#: dcdcd2d8dd964b08858bd624e9f2689c +msgid "47.11" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:66 -#: 5e888acd738b403280fcc366d84bd968 -msgid "36.23" +#: ../../source/benchmark/speed_benchmark.rst:65 +#: 14fc365eeb3440588536d6a9d6c82388 +msgid "1.60" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:66 -#: 576263eb00ff4410838b596a671b32ca -msgid "13.15" +#: ../../source/benchmark/speed_benchmark.rst:67 +#: d0c28e4ea5994f849a8d4e7e55ff9cbb +msgid "35.44" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:68 -#: b1b3af71d6014981a26d9b93dc746830 -msgid "48.68" +#: ../../source/benchmark/speed_benchmark.rst:67 +#: 53b0a657215c4ae2880d11d73770620d +msgid "1.26" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:68 -#: ccb41af5c955448b9e86b13cb8032915 -msgid "12.97" +#: ../../source/benchmark/speed_benchmark.rst:69 +#: 5a24b968f695490b87c8e17d3741c575 +msgid "48.26" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:70 -#: 70ad05614e664d58acd34db476124c98 -msgid "38.94" +#: ../../source/benchmark/speed_benchmark.rst:69 +#: ../../source/benchmark/speed_benchmark.rst:71 +#: 33c627b63c8646c380c661142aa254cf 849e830023f7430aa8017fe634e1aa42 +msgid "1.10" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:70 -#: 21f5daf199a54d5baa68038b1de87035 -msgid "12.99" +#: ../../source/benchmark/speed_benchmark.rst:71 +#: c06e0f5f3ce64f10bee423985cdd2dab +msgid "37.14" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:72 -#: ../../source/benchmark/speed_benchmark.rst:110 -#: ../../source/benchmark/speed_benchmark.rst:149 -#: ../../source/benchmark/speed_benchmark.rst:187 -#: ../../source/benchmark/speed_benchmark.rst:226 -#: ../../source/benchmark/speed_benchmark.rst:265 -#: ../../source/benchmark/speed_benchmark.rst:308 -#: ../../source/benchmark/speed_benchmark.rst:324 -#: ../../source/benchmark/speed_benchmark.rst:374 -#: ../../source/benchmark/speed_benchmark.rst:419 -#: ../../source/benchmark/speed_benchmark.rst:427 -#: 66434d21ca3c4500b21c610d842fea26 +#: ../../source/benchmark/speed_benchmark.rst:73 +#: ../../source/benchmark/speed_benchmark.rst:112 +#: ../../source/benchmark/speed_benchmark.rst:152 +#: ../../source/benchmark/speed_benchmark.rst:191 +#: ../../source/benchmark/speed_benchmark.rst:231 +#: ../../source/benchmark/speed_benchmark.rst:270 +#: ../../source/benchmark/speed_benchmark.rst:310 +#: ../../source/benchmark/speed_benchmark.rst:350 +#: ../../source/benchmark/speed_benchmark.rst:407 +#: ../../source/benchmark/speed_benchmark.rst:446 +#: ../../source/benchmark/speed_benchmark.rst:505 +#: ../../source/benchmark/speed_benchmark.rst:547 +#: ../../source/benchmark/speed_benchmark.rst:609 +#: ../../source/benchmark/speed_benchmark.rst:656 +#: 0a176a7e0dcf4ff7ae701bc5e7d587bb msgid "30720" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:72 -#: 6caac0e92a0c4878ab182276f59b1a5a -msgid "49.25" +#: ../../source/benchmark/speed_benchmark.rst:73 +#: b7bfccab32e24ce28ea83aae735d1550 +msgid "47.16" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:72 -#: ../../source/benchmark/speed_benchmark.rst:224 -#: bd358e18689d4d7e958a7c40b95f54c9 -msgid "27.61" +#: ../../source/benchmark/speed_benchmark.rst:73 +#: 33100594b9124b019077a76333601aef +msgid "2.34" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:74 -#: b6534e4b6d2e49a9b52f49136ff4160b -msgid "34.61" +#: ../../source/benchmark/speed_benchmark.rst:75 +#: ef717bccfd9c4033ad6dcd0709881410 +msgid "36.25" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:74 -#: 95f412d519524eeeaed30c404cfbe853 -msgid "27.28" +#: ../../source/benchmark/speed_benchmark.rst:75 +#: 880ed5446cf64429b0fa0e6c1fd3a898 +msgid "2.01" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:76 -#: cbf81c32f81e4727aa522a1776d3e5d4 -msgid "48.18" +#: ../../source/benchmark/speed_benchmark.rst:77 +#: e4d1aec476bd48838c00f3dae53a2969 +msgid "49.22" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:76 -#: cb08e94a34884e0189889838ec3ae24d -msgid "27.12" +#: ../../source/benchmark/speed_benchmark.rst:77 +#: 5ad95205e60b4c8d82c2f475db16864f +msgid "1.85" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:78 -#: 3885fdf7986847ee9e2e2a515b35708e -msgid "38.19" +#: ../../source/benchmark/speed_benchmark.rst:79 +#: db0ad21e1eeb4b4da42aba2fa67bd3ab +msgid "36.90" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:78 -#: 00160d4e4a8a4b7281d69a5561c93465 -msgid "27.11" +#: ../../source/benchmark/speed_benchmark.rst:79 +#: 1abffedc29e14737ba136d3afd32ecdb +msgid "1.84" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:81 -#: fe2476be1e1e4fe78b61ef1568f74a09 +#: ../../source/benchmark/speed_benchmark.rst:83 +#: 3e26d2d99f7c4ae1b78efea53a5b0b86 msgid "0.5B (vLLM)" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:86 -#: dd484cd49ebf4be8b751b7fd8c4d6251 -msgid "270.49" -msgstr "" - #: ../../source/benchmark/speed_benchmark.rst:88 -#: 3ff18bb246f6426eae917293e1a3023f -msgid "235.95" +#: 9cffc232095b46f98fa934edfa9fc82a +msgid "311.55" msgstr "" #: ../../source/benchmark/speed_benchmark.rst:90 -#: 420043ef7154432c9caeba778897860b -msgid "240.07" +#: a1621494695b40539d326f69051d3bd1 +msgid "257.07" msgstr "" #: ../../source/benchmark/speed_benchmark.rst:92 -#: ba025c19312c461d974b3d1e3620a5b0 -msgid "233.31" +#: 902218e94a0b45a2b0814abf77e1733c +msgid "260.93" msgstr "" #: ../../source/benchmark/speed_benchmark.rst:94 -#: 513ca55c943349dbbfc57cf8454a368f -msgid "256.16" +#: 83eeb0af79b840c98e3f8617d3cb9df6 +msgid "261.95" msgstr "" #: ../../source/benchmark/speed_benchmark.rst:96 -#: 862f3bc6ff0b4eb0a8f75d47edbb7175 -msgid "224.30" +#: ba89d005351d47a08775ca517bdc2eb2 +msgid "304.79" msgstr "" #: ../../source/benchmark/speed_benchmark.rst:98 -#: de1a2b035d6f4884b92a5ea391615fd4 -msgid "226.41" +#: e294d594f7e143fba53e758c444efc95 +msgid "254.10" msgstr "" #: ../../source/benchmark/speed_benchmark.rst:100 -#: 01684d53f5b14502ba31d26c8af11670 -msgid "222.83" +#: 9fcfac1eb759416b978d72aa46549feb +msgid "257.33" msgstr "" #: ../../source/benchmark/speed_benchmark.rst:102 -#: 6996912f3e8a41d1a7032f04193d96fa -msgid "108.89" +#: e23ee2c559ba42ef854234fa06bb96ab +msgid "259.80" msgstr "" #: ../../source/benchmark/speed_benchmark.rst:104 -#: 42fffbad74c74a498f2dc6707bec3e5a -msgid "108.10" +#: 00fe2e6cb3aa41819bb24dc8b448c890 +msgid "290.28" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:106 +#: a44f8bc6612f4ec2a2289f8772e41203 +msgid "243.69" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:108 +#: fd80649ef116497ba426edc5f45de453 +msgid "247.01" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:110 +#: 17522829c59e4a50a0a74eb2cde70310 +msgid "249.58" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:112 +#: ae18115b27d14aa98cd4607d76cdb707 +msgid "264.51" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:114 +#: 3eb46979e4b14a05adb3a51ec0396519 +msgid "223.86" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:116 +#: 92daab465ec24e04a71c26e03bddbf32 +msgid "226.50" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:118 +#: 04ae6d05605d4207874e855e1f63382d +msgid "229.84" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:123 +#: 7d580f9796794f38a18900d9288d26be +msgid "1.5B (Transformer)" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:128 +#: ../../source/benchmark/speed_benchmark.rst:167 +#: 8cafaa9450e3437abf042c21ba55a3d9 +msgid "Qwen2.5-1.5B-Instruct" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:128 +#: 594836ead3fa4979bf30fe8e593ab64a +msgid "39.68" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:128 +#: 9bbe1bff4a2d4c508143cd3d7844e3ac +msgid "2.95" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:130 +#: 286c67211dcd49a19496074e2517771b +msgid "32.62" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:130 +#: 2a576f22580c426181320c74dc87c8cc +msgid "1.82" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:132 +#: 8e2f4275615347c69930dbec4579b040 +msgid "43.33" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:132 +#: 49a7a00dc96d417b9185ac07be71bbcf +msgid "1.18" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:134 +#: a91340cc3ff944fe81c3240c360d9176 +msgid "31.70" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:134 +#: 5630e80f72324811b0bf66d246d3e6ed +msgid "1.51" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:136 +#: e56a73c68f8d4d79ba4e6c1a59a968ab +msgid "40.88" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:136 +#: 28be0dafd3bd444daffd1857c35ec690 +msgid "3.43" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:138 +#: 315878f7c9954721ac17e8cf6cb491d7 +msgid "31.46" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:138 +#: 53963753bb0342c2af390f309a648c0c +msgid "2.30" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:140 +#: 9a5379cd9b98494abfdb28c3c2444f87 +msgid "43.96" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:140 +#: 3625b1fbf7734df987a671113eba55fa +msgid "1.66" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:142 +#: 1b96812e82e4488ca3884db5d4320bd1 +msgid "32.30" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:142 +#: 0e46565da0f64915aeb3481c4af21c83 +msgid "1.63" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:144 +#: 15b3d4f76a494739b1192616cf252f7b +msgid "40.43" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:144 +#: 0c91eb650bfe40d99fe2b08c4afa7ea5 +msgid "4.16" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:146 +#: 4453e171ef40449688394f7caa821091 +msgid "31.06" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:146 +#: b82947c25bb24e63aee31d23420709d8 +msgid "3.03" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:148 +#: 56fa95a3619d483ba9cb5633c45bfe8d +msgid "43.66" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:148 +#: 839a3b3ed2944d08a372913abc86152f +msgid "2.39" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:150 +#: 6569123f209348398457ccb60a143d64 +msgid "32.39" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:150 +#: c045b35c103d40628a2130d55c0e1c93 +msgid "2.36" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:152 +#: 9e8b4dd9bbfb443ab1f61a17884fe032 +msgid "38.59" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:152 +#: f9c0a1da582040faba553fc47c22a3e4 +msgid "5.62" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:154 +#: 2a82ecd7acd04cd3844821a2a2c2c3cc +msgid "31.04" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:154 +#: b81fb53168a74923a67e78eb8d6f55d9 +msgid "4.49" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:156 +#: 98cc75e2467d43bebb467c9d11855dec +msgid "35.68" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:156 +#: 588d59302d0e4f09b2d2b57d7301bcba +msgid "3.85" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:158 +#: ../../source/benchmark/speed_benchmark.rst:399 +#: 844f9407c689443fa1ae2d3e695b24cb +msgid "31.95" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:158 +#: 023eb06cdee34b149643c00988002c3a +msgid "3.82" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:162 +#: 6b3466db2b2b47f180d5efbbfda24c6a +msgid "1.5B (vLLM)" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:167 +#: ad04f4ef634241d79c30196ef21b679e +msgid "183.33" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:169 +#: 7f37cd63d1264cf6963410ab43648149 +msgid "201.67" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:171 +#: bbf2a54df99e4f1db895f83afa38dcc0 +msgid "217.03" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:173 +#: e07dc1d2fd724561ac8e3d2e6ac0a921 +msgid "213.74" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:175 +#: 813967b69dd84d38905372f6345c77e8 +msgid "176.68" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:177 +#: 0ce7d395f3c8430d95b6f46cbaa8d24a +msgid "192.83" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:179 +#: b68465b2b131495eafa2c88c9108e53b +msgid "206.63" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:181 +#: 6b2ed1180b2d48bb946e35745e163293 +msgid "203.64" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:183 +#: 1bc126490d144597b5482fa7eb95d7ee +msgid "168.69" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:185 +#: 3008d00f21694735b61762245e677468 +msgid "183.69" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:187 +#: 9b5e5a5e53064237a599ec28b6edad57 +msgid "195.88" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:189 +#: ec6660511fd3433cae65ddf72115ffab +msgid "192.64" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:191 +#: a92ee1c8880b4e8f867e167d7f364515 +msgid "152.04" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:193 +#: 486afa67eadd44c98ef4cf5a0c1109e4 +msgid "162.82" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:195 +#: 773b107c088e4531a8e85a01dca14097 +msgid "173.57" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:197 +#: 9f9c766b50784524a917b446c4efb36d +msgid "170.20" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:202 +#: 483ecbced64a418484e87cf91bddc864 +msgid "3B (Transformer)" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:207 +#: ../../source/benchmark/speed_benchmark.rst:246 +#: cf2efd68fb8242579c64d663fccb8609 +msgid "Qwen2.5-3B-Instruct" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:207 +#: b2afee60bce043839b49365e21d428a8 +msgid "30.80" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:207 +#: 9a9021e7246440f0aa79a416167fdbad +msgid "5.95" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:209 +#: f05d78f862224f4fa6a175a976db215a +msgid "25.69" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:209 +#: 272e54f60c964f4bb5e8f19b05161a24 +msgid "3.38" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:211 +#: 416ea3f157ae482d83cbaf59b4c335b9 +msgid "35.21" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:211 +#: 6a14e7e3b3e349f3befa19e9e4226776 +msgid "2.06" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:213 +#: c2480623ad7b494093a85be577e2d027 +msgid "25.29" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:213 +#: 12256870989e4f95814f2f69765c1fbf +msgid "2.50" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:215 +#: f9ff4018035f455c98fd2bfdc53d6590 +msgid "32.20" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:215 +#: 94f771c78314422aa83db128bb3baf03 +msgid "6.59" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:217 +#: aa71a5f4a5074e71b22b93ab64f39c69 +msgid "24.69" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:217 +#: 36ce5d858c5543af83b20ff393768594 +msgid "3.98" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:219 +#: bc6f69e4c1a947b081abfd72264d20fa +msgid "34.47" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:219 +#: 95f555fad98a40a88ab75f0201370c13 +msgid "2.67" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:221 +#: dbfe10e718ef42b0bef355c3e80b66c1 +msgid "24.86" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:221 +#: d508010365f14f5fb36b7ea77b71752b +msgid "2.62" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:223 +#: 79f8bc0282264082bf3c93eed3f04590 +msgid "31.72" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:223 +#: ee392132bcbf440fabf43db322c562df +msgid "7.47" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:225 +#: 50925b544a2e4bc3aa0ccc257024ca4b +msgid "24.70" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:225 +#: 9dba8e275b8c4f70b98f1f5970af8730 +msgid "4.89" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:227 +#: 71744764a5a449b7aac946d3153c56ab +msgid "34.36" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:227 +#: af83bc61338e4e929a3f87afb1239db8 +msgid "3.58" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:229 +#: 70320737dc1d45ed889623d7ef2f8279 +msgid "25.19" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:229 +#: 17ccac6b5f504b069184fe9970d5edb2 +msgid "3.54" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:231 +#: 352f513b44bf447499e8667728f20c97 +msgid "25.37" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:231 +#: 28647cb7d4014e1a9f06ce0850449995 +msgid "9.30" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:233 +#: e7527af698f2446e83f80f26bfb8df24 +msgid "21.67" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:233 +#: 50664609916a4c4593d6150273c9c2b3 +msgid "6.72" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:235 +#: f2202ff4a16240dfadd8a336ce3e271a +msgid "23.60" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:235 +#: 2da70f70e30b47df9635c2149ec819e8 +msgid "5.41" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:237 +#: 50052bf46d534c2f82cae42ce5b735cb +msgid "24.56" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:237 +#: 041efa2948af4f5f9054b2ed838a6c9f +msgid "5.37" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:241 +#: e561883e92854d4ebd5ca5b91ebada0c +msgid "3B (vLLM)" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:246 +#: a88ee9d985b04ebdb0dd7d456084e4a4 +msgid "127.61" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:248 +#: 46442bb290974c1aafcf43c651a9616a +msgid "150.02" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:250 +#: 12a16651244941e680062a2e2e1e86de +msgid "168.20" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:252 +#: d617e2f92f0846a6a625b8c8992d3388 +msgid "165.50" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:254 +#: 12e6eef1b9d047a9a790905be0915b7b +msgid "123.15" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:256 +#: 0e86425155a047b1ab633a8cfb284af2 +msgid "143.09" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:258 +#: 2b5000ec29c947ccb6de0cf86cc658f4 +msgid "159.85" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:260 +#: 901ff6020c7b420b869092e8a318ec82 +msgid "156.38" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:262 +#: 1f88e99d71b54a4b9cfd229d0c593ec7 +msgid "117.35" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:264 +#: 4f4aa3d8b9c04eab8d76f25617b42f50 +msgid "135.50" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:266 +#: 1121632487ac41a2a93f5246c286ae87 +msgid "149.35" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:268 +#: 7a63565c74d34e0e93ae496a8bf41d9c +msgid "147.75" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:270 +#: 0a2b7b0877704fbdaa1565f6cc874c9e +msgid "105.88" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:272 +#: efe8447fabd44706abd8aeffa943d78d +msgid "118.38" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:274 +#: 408f7a3c81cc4285918e06b326250ae5 +msgid "129.28" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:276 +#: ea2479261ef04b93befc2819c3d7ef71 +msgid "127.19" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:281 +#: 749d8bad50d54d558a2a7c8109be1f15 +msgid "7B (Transformer)" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:286 +#: ../../source/benchmark/speed_benchmark.rst:326 +#: fcb4a3afd4e047bca683738287ef07a3 +msgid "Qwen2.5-7B-Instruct" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:286 +#: 3c2c924335a441b4b3af27d136fbc23f +msgid "40.38" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:286 +#: 27c0b52647f94f8d9d97f24af273d88e +msgid "14.38" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:288 +#: bfb1234b9b474efd8adb54779a39cb0d +msgid "31.55" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:288 +#: 6153946a227a453da7955093584fa625 +msgid "8.42" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:290 +#: 08f508125e15429aa02beb09ee6bd4e2 +msgid "43.10" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:290 +#: b934cda8a5c044d798046460029750d3 +msgid "5.52" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:292 +#: 0570ef47e5624272917ce2938d90ccb0 +msgid "32.03" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:292 +#: 955fa198c3994dff9814022acdc7a7a2 +msgid "5.39" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:294 +#: 2870ce90ae4c44e8a02f58fc5cf34614 +msgid "38.76" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:294 +#: 7ec940d461ef46f88084e313136078d4 +msgid "15.38" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:296 +#: 09967cea16854a2bb9de46c214b24651 +msgid "31.26" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:296 +#: 04d7d18c4b6744e1af9f5dce4723e274 +msgid "9.43" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:298 +#: 5e18fdfb38d7477daba502e09cb5a01b +msgid "38.27" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:298 +#: aac8b1fb7c8d43a8a40abe03c97ec5e3 +msgid "6.52" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:300 +#: db61ac295ebe4d3a96dc0de0500f9479 +msgid "32.37" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:300 +#: ../../source/benchmark/speed_benchmark.rst:593 +#: ../../source/benchmark/speed_benchmark.rst:595 +#: 5c81460630084159bd47be4557a3ce9f +msgid "6.39" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:302 +#: 4c651b836f86491f87eb22f2fad0c5a1 +msgid "29.78" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:302 +#: ff906cf57c0b49c092513cdc11b47c13 +msgid "16.91" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:304 +#: 72abf1bb71494eb6865366749931a5b7 +msgid "26.86" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:304 +#: b1aae2fc1da948ca866a025f39608846 +msgid "10.96" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:306 +#: f3993efb0e964ccdabf24e8493c044a4 +msgid "28.70" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:306 +#: d32034adfd824cb696afdef90fa076d4 +msgid "8.05" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:308 +#: 03ae9c2b037a477c84eeef5968c04b1e +msgid "30.23" +msgstr "" + +#: ../../source/benchmark/speed_benchmark.rst:308 +#: 0ec8f6316066482c9f6fa5e20a2c880b +msgid "7.92" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:106 -#: 9bc4d5a9413640bfaf2e8cdaf3d9de23 -msgid "106.51" +#: ../../source/benchmark/speed_benchmark.rst:310 +#: 6be1e0b48817441195446f112775565f +msgid "18.83" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:108 -#: 1f0f4505f7284df78a4322a61914f601 -msgid "104.16" +#: ../../source/benchmark/speed_benchmark.rst:310 +#: 58d02bb2fa094a87b3bd01cdb52f64be +msgid "19.97" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:110 -#: 461402e6ab4f470c9982574e2fa8bc68 -msgid "97.20" +#: ../../source/benchmark/speed_benchmark.rst:312 +#: 19ad93354d704591bb1d3e79879f216f +msgid "17.59" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:112 -#: bdaa7f6e026f47a4b2cacb232ad06387 -msgid "94.49" +#: ../../source/benchmark/speed_benchmark.rst:312 +#: 78c7430820c54495ae2ecfa313196522 +msgid "14.01" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:114 -#: 293677d7f8b9447e918689bb816f3579 -msgid "93.94" +#: ../../source/benchmark/speed_benchmark.rst:314 +#: d96d0f8ac0d94a3487988708aa0b6fe3 +msgid "18.45" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:116 -#: 3f723444a6184d3cb313ae0c6a7787dd -msgid "92.23" +#: ../../source/benchmark/speed_benchmark.rst:314 +#: 002dc187a0e64f38ac9d3bee7808afee +msgid "11.11" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:120 -#: 3bc05396581d451091e1ab8447ac93e6 -msgid "1.5B (Transformer)" +#: ../../source/benchmark/speed_benchmark.rst:316 +#: a810b1a65945460ba05e24885b7f3dc0 +msgid "19.11" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:125 -#: ../../source/benchmark/speed_benchmark.rst:163 -#: c075f7be1bd3453582f4853d19cf6c5a -msgid "Qwen2-1.5B-Instruct" +#: ../../source/benchmark/speed_benchmark.rst:316 +#: c8d7a7404bcc4fae85a53665dc095a49 +msgid "10.98" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:125 -#: 940f647973f941f18361f2102d70056b -msgid "40.89" +#: ../../source/benchmark/speed_benchmark.rst:321 +#: 133954fb306d48a9b11acf7b091c51a2 +msgid "7B (vLLM)" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:125 -#: 33a93e83ba564a9e917b2d6a754439fa -msgid "3.44" +#: ../../source/benchmark/speed_benchmark.rst:326 +#: 3b16a6d4302d4f9fac1c7efbca5b7252 +msgid "84.28" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:127 -#: 84d70d96dde541f5971359449048324d -msgid "31.51" +#: ../../source/benchmark/speed_benchmark.rst:328 +#: 54b65a2e12f64715a9849f4731422531 +msgid "122.01" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:127 -#: c30109a7082543f9bc6a5fe55a8b6a58 -msgid "2.31" +#: ../../source/benchmark/speed_benchmark.rst:330 +#: 4df649f9e5f14da1a13f78d05db12740 +msgid "154.05" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:129 -#: 6f68abe9e42c47639eeaeb4f5609f7b4 -msgid "42.47" +#: ../../source/benchmark/speed_benchmark.rst:332 +#: 5bee2fef808c4a94babaa97b2682ae2f +msgid "148.10" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:129 -#: 73734b29fe234dbfbafba34f242c1e4a -msgid "1.67" +#: ../../source/benchmark/speed_benchmark.rst:334 +#: b5c20c1bf3dc405692a94240e8eb290e +msgid "80.70" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:131 -#: c1fa797eb41a422f9361864ff712afa2 -msgid "33.62" +#: ../../source/benchmark/speed_benchmark.rst:336 +#: 94fecae3503c4b839016be3d81c971f6 +msgid "112.38" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:131 -#: e700d18b36e64ad0a232c0214294920b -msgid "1.64" +#: ../../source/benchmark/speed_benchmark.rst:338 +#: ec9073a5cad8441ead072bf56f5f637c +msgid "141.98" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:133 -#: 688d1121e73a4812a4df971e66eb67c1 -msgid "40.86" +#: ../../source/benchmark/speed_benchmark.rst:340 +#: 6483afea40794a6aa6ebfb50edfc271d +msgid "137.64" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:133 -#: a3b341cd57c64071af1e1177c9eef514 -msgid "8.74" +#: ../../source/benchmark/speed_benchmark.rst:342 +#: 45c80e2974ec41e99bfefac1db1b3464 +msgid "77.69" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:135 -#: ea1ea6dae9f04dec9ec1577bd1fb199e -msgid "31.31" +#: ../../source/benchmark/speed_benchmark.rst:344 +#: 83fb7e8619c14079b310ca66b65aa067 +msgid "105.25" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:135 -#: caadde1aa8004027a18d1c239f6e5e14 -msgid "7.59" +#: ../../source/benchmark/speed_benchmark.rst:346 +#: af853ac2bfd54a49a45aa4369a5a0fba +msgid "129.35" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:137 -#: 765e83dfbd2f40e7b77ca5a50feb121a -msgid "42.78" +#: ../../source/benchmark/speed_benchmark.rst:348 +#: 4172734562c4474aaeb2498d636822f1 +msgid "124.91" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:137 -#: 93a119e08db942b29b2bcd62b9add69a -msgid "6.95" +#: ../../source/benchmark/speed_benchmark.rst:350 +#: 40320ba4e1194eb99d2473e4491bb7f8 +msgid "70.33" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:139 -#: c7948bf6de3d40c3b36c011960dfcce0 -msgid "32.90" +#: ../../source/benchmark/speed_benchmark.rst:352 +#: 097b95f7850c410295defa167ccbe7b3 +msgid "90.71" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:139 -#: 42f7420278f0473390c49416b4f41d34 -msgid "6.92" +#: ../../source/benchmark/speed_benchmark.rst:354 +#: b97dce41bc2d4921a4fc00799bbc5dcd +msgid "108.30" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:141 -#: 5b3fe348759643dc9c660c83da77b653 -msgid "40.08" +#: ../../source/benchmark/speed_benchmark.rst:356 +#: 3c3cae4399744a6190c88476f3dd79d2 +msgid "104.66" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:141 -#: 773ea8ca41ba4c3e8e8e958b597c0fcb -msgid "15.92" +#: ../../source/benchmark/speed_benchmark.rst:358 +#: ../../source/benchmark/speed_benchmark.rst:454 +#: ../../source/benchmark/speed_benchmark.rst:555 +#: ../../source/benchmark/speed_benchmark.rst:664 +#: 9259c8941b6b4551b9f0bc24f9ef2523 +msgid "63488" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:143 -#: e1bb60b24d8b442f99ee0b8682bde89c -msgid "31.19" +#: ../../source/benchmark/speed_benchmark.rst:358 +#: f423ec2646fa49b18d7ebaab159c7a21 +msgid "50.86" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:143 -#: 018567fe9dca42dfb12653d1977ccfe8 -msgid "14.79" +#: ../../source/benchmark/speed_benchmark.rst:358 +#: ../../source/benchmark/speed_benchmark.rst:360 +#: ../../source/benchmark/speed_benchmark.rst:362 +#: ../../source/benchmark/speed_benchmark.rst:364 +#: ../../source/benchmark/speed_benchmark.rst:454 +#: ../../source/benchmark/speed_benchmark.rst:456 +#: ../../source/benchmark/speed_benchmark.rst:458 +#: ../../source/benchmark/speed_benchmark.rst:460 +#: ../../source/benchmark/speed_benchmark.rst:555 +#: ../../source/benchmark/speed_benchmark.rst:557 +#: ../../source/benchmark/speed_benchmark.rst:559 +#: ../../source/benchmark/speed_benchmark.rst:561 +#: 9f914a0377a345d49e368e87916880dc +msgid "setting-64k" +msgstr "[设定3]" + +#: ../../source/benchmark/speed_benchmark.rst:360 +#: ae5ea50e06bc4a2aae58f40493a9f05b +msgid "60.52" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:145 -#: c69e4b4c0c604dc48eb84e00cfde594f -msgid "42.25" +#: ../../source/benchmark/speed_benchmark.rst:362 +#: dfc99e88603e4ecea4402dab9745c3b2 +msgid "67.97" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:145 -#: 20b961001b59497f8ab30c4085507df3 -msgid "14.14" +#: ../../source/benchmark/speed_benchmark.rst:364 +#: f522310880a04e2a81a3b3ac5cedb576 +msgid "66.42" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:147 -#: 8c68b14d52844f2d886c35a686834974 -msgid "33.24" +#: ../../source/benchmark/speed_benchmark.rst:366 +#: ../../source/benchmark/speed_benchmark.rst:462 +#: ../../source/benchmark/speed_benchmark.rst:563 +#: ../../source/benchmark/speed_benchmark.rst:672 +#: faefdd63feaa4a01b2752e5598b532f4 +msgid "129024" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:147 -#: 0212afe446b145549d91c105e5e76614 -msgid "14.12" +#: ../../source/benchmark/speed_benchmark.rst:366 +#: c42f54029293491b99c956c75c70b668 +msgid "28.94" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:149 -#: 265ca5aa4b75412cb97db32ede5c0b80 -msgid "34.09" +#: ../../source/benchmark/speed_benchmark.rst:366 +#: ../../source/benchmark/speed_benchmark.rst:368 +#: ../../source/benchmark/speed_benchmark.rst:370 +#: ../../source/benchmark/speed_benchmark.rst:372 +#: ../../source/benchmark/speed_benchmark.rst:462 +#: ../../source/benchmark/speed_benchmark.rst:464 +#: ../../source/benchmark/speed_benchmark.rst:466 +#: ../../source/benchmark/speed_benchmark.rst:468 +#: ../../source/benchmark/speed_benchmark.rst:563 +#: ../../source/benchmark/speed_benchmark.rst:565 +#: ../../source/benchmark/speed_benchmark.rst:567 +#: ../../source/benchmark/speed_benchmark.rst:569 +#: 85d257e52ab94f73a93fa58256c33254 +msgid "vllm==0.6.2, new sample config" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:149 -#: 624e92cc77e74e74aa6a9b39bfecb4fd -msgid "30.31" +#: ../../source/benchmark/speed_benchmark.rst:368 +#: f3c0b8aad8b447cf9094e0f935dd6154 +msgid "25.97" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:151 -#: 51548441e0e344cfb930a8834eb4487f -msgid "28.52" +#: ../../source/benchmark/speed_benchmark.rst:370 +#: 853ab75633f34606b5c88c7c2fde8b9e +msgid "26.37" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:151 -#: 6ae32ef178ec468885ed6481e3309479 -msgid "29.18" +#: ../../source/benchmark/speed_benchmark.rst:372 +#: 0455b0ef546046b8aa90f9fea2c9c227 +msgid "26.57" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:153 -#: 9de15749ba814fe5a9cd3eed27b72791 -msgid "31.30" +#: ../../source/benchmark/speed_benchmark.rst:375 +#: ../../source/benchmark/speed_benchmark.rst:471 +#: ../../source/benchmark/speed_benchmark.rst:575 +#: 72d24e828e8943af941fe8a9cebd7d0b +msgid "[Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)" +msgstr "[默认设定]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)" + +#: ../../source/benchmark/speed_benchmark.rst:376 +#: ../../source/benchmark/speed_benchmark.rst:472 +#: ../../source/benchmark/speed_benchmark.rst:576 +#: 11198a89e8b7487e828a4f6a09b88099 120e6bede1b546419976fec97f3a7ca2 +#: 72ed1e8d670944b497fd3cb233e311c9 +msgid "[new sample config]: for vLLM, set the following sampling parameters: SamplingParams(temperature=0.7,top_p=0.8,top_k=20,repetition_penalty=1,presence_penalty=0,frequency_penalty=0,max_tokens=out_length)" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:153 -#: 770e0c3fc018489eae33ae577cca5cec -msgid "28.54" +#: ../../source/benchmark/speed_benchmark.rst:378 +#: 68fe7a0cc323439ca01ff176b7ec3f75 +msgid "14B (Transformer)" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:155 -#: f7130dfa6c33425d86256e256d858806 -msgid "32.16" +#: ../../source/benchmark/speed_benchmark.rst:383 +#: ../../source/benchmark/speed_benchmark.rst:422 +#: 0efef6844eb24c1ca0e88aa055bb539d +msgid "Qwen2.5-14B-Instruct" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:155 -#: 9147b886700248fb8c2727be584ae56b -msgid "28.51" +#: ../../source/benchmark/speed_benchmark.rst:383 +#: 346d77e66b5643378af83efb842ae18c +msgid "24.74" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:158 -#: d92811c61f17424b8776a4e80e11ca58 -msgid "1.5B (vLLM)" +#: ../../source/benchmark/speed_benchmark.rst:383 +#: c197b99d194340f6aa9b71370bc60405 +msgid "28.08" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:163 -#: 54ce51f375824d0094cc2e0926faab25 -msgid "175.55" +#: ../../source/benchmark/speed_benchmark.rst:385 +#: 84c9e131c7ae44ce9f22a73afe5abf8c +msgid "18.84" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:165 -#: 0b778c681ce54d66be31e0cd56771b23 -msgid "172.28" +#: ../../source/benchmark/speed_benchmark.rst:385 +#: 1fc65ba82d28427d9ffad1e25cf47855 +msgid "16.11" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:167 -#: f10c01b0ec4949ce84cbc10634cef502 -msgid "184.58" +#: ../../source/benchmark/speed_benchmark.rst:387 +#: d9d3a17ff48d4f62a137d86ea0d2d7b3 +msgid "25.89" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:169 -#: 24aa76bfc9e74b2da07dbd9b8d002f8f -msgid "170.87" +#: ../../source/benchmark/speed_benchmark.rst:387 +#: c3c05e47a2a442c3be880a0c47641c28 +msgid "9.94" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:171 -#: 84d125e1ff8c4b22a31e67956aafaccb -msgid "166.23" +#: ../../source/benchmark/speed_benchmark.rst:389 +#: af27bcc2ffb74337996a9daf330f6393 +msgid "19.23" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:173 -#: b25360ebc8e64572afddba261b2e0454 -msgid "164.32" +#: ../../source/benchmark/speed_benchmark.rst:389 +#: 6a6f9c24e8864b1e9a45a98ea55cd547 +msgid "9.79" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:175 -#: 2371d6f852d9434da97e0c241fe2df5b -msgid "174.04" +#: ../../source/benchmark/speed_benchmark.rst:391 +#: 4d53012c3bc04013bdd4239f79f2999a +msgid "20.51" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:177 -#: cec70be90f1246829919090565da1d04 -msgid "162.81" +#: ../../source/benchmark/speed_benchmark.rst:391 +#: 54b11f8eea1f443696134b8190625699 +msgid "29.50" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:179 -#: 62e808f52f67419291ce162de4f75483 -msgid "83.67" +#: ../../source/benchmark/speed_benchmark.rst:393 +#: f83b5b1af9544fc0adaa5a345984d1f3 +msgid "17.80" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:181 -#: 325a76ef42644a9e81aa6841e40d2b49 -msgid "98.63" +#: ../../source/benchmark/speed_benchmark.rst:393 +#: d859fc3611fa4c73acf71bc6385e67c4 +msgid "17.61" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:183 -#: 9fd5160c9fba4c0885e594392be90344 -msgid "97.65" +#: ../../source/benchmark/speed_benchmark.rst:395 +#: 3184881f5dc540de99cf1d03c0531483 +msgid "20.06" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:185 -#: ca9f28089d854da28bd96755773ad0b7 -msgid "92.48" +#: ../../source/benchmark/speed_benchmark.rst:395 +#: 530e7ceefa3347439c9a19a4ed0c96b3 +msgid "11.36" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:187 -#: b6a87ff7de0f443faa640c5211e0895c -msgid "77.69" +#: ../../source/benchmark/speed_benchmark.rst:397 +#: 99a4ad03e5164cf2bdf046340c1609df +msgid "19.21" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:189 -#: aaf0a02fd4d34de7bb47aacdd731ca6d -msgid "86.42" +#: ../../source/benchmark/speed_benchmark.rst:397 +#: 7b343b8a4e284ad19be19e592cf9b2fb +msgid "11.22" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:191 -#: edf6ad7d5056492ebdfe8f6dfacf7c05 -msgid "87.49" +#: ../../source/benchmark/speed_benchmark.rst:399 +#: 110d7d092a1347caab9e1566fbdcd886 +msgid "13.92" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:193 -#: 177861083b7b4faaaea3aed0bde2ee5e -msgid "82.88" +#: ../../source/benchmark/speed_benchmark.rst:401 +#: 20d86db591b34f81a37003458663f10a +msgid "12.66" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:197 -#: 591b5d8eb7cf4e268960c3c765641bbc -msgid "7B (Transformer)" +#: ../../source/benchmark/speed_benchmark.rst:401 +#: 0ed4b55046b4464da3237660400a0242 +msgid "19.98" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:202 -#: ../../source/benchmark/speed_benchmark.rst:241 -#: 2425a940728e4d9ea72b078da4335782 -msgid "Qwen2-7B-Instruct" +#: ../../source/benchmark/speed_benchmark.rst:403 +#: 7be270ba5da041618a1c08e985376fd3 +msgid "13.79" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:202 -#: 13be9dd7081145d09cf5136f67a9fe3c -msgid "37.97" +#: ../../source/benchmark/speed_benchmark.rst:403 +#: ../../source/benchmark/speed_benchmark.rst:495 +#: 6e07ed9db7c14005b267175081822d32 +msgid "13.81" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:202 -#: a8306c2c6628426ba629671f2b91d432 -msgid "14.92" +#: ../../source/benchmark/speed_benchmark.rst:405 +#: bbb1220bd1714c0298ddb91667a1c716 +msgid "14.17" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:204 -#: 7d2cae030101487294be3e6005c861fc -msgid "30.85" +#: ../../source/benchmark/speed_benchmark.rst:405 +#: edc522c6e4514cc08e776ae31bb1e6ae +msgid "13.67" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:204 -#: 5bda58cdaa24458faf190e38cbad13f5 -msgid "8.97" +#: ../../source/benchmark/speed_benchmark.rst:407 +#: 5109d14859a144adbbc0b64a741a0990 +msgid "8.20" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:206 -#: 2a27fc54d87e41a9ab1e8992df16b23b -msgid "36.17" +#: ../../source/benchmark/speed_benchmark.rst:407 +#: e9f14eac53b044918ae403d7144158dd +msgid "36.85" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:206 -#: ffe46abbb7824a679d02206dee3333ab -msgid "6.06" +#: ../../source/benchmark/speed_benchmark.rst:409 +#: 2affc3a315b544b6b0ed5ad1372a1fe3 +msgid "7.77" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:208 -#: 36df0c1fcd6a4407bfe596b1a1aa4a5b -msgid "33.08" +#: ../../source/benchmark/speed_benchmark.rst:409 +#: 81fee54867b346919cfac7517d85b005 +msgid "24.88" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:210 -#: ca21c086017e42c08ae8b48e38ba2e44 -msgid "34.74" +#: ../../source/benchmark/speed_benchmark.rst:411 +#: 720a6dac666b4c938ee05ce98aa68ad3 +msgid "8.14" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:210 -#: ad871a1ca3ef4c96bdf19b804c822fb4 -msgid "20.26" +#: ../../source/benchmark/speed_benchmark.rst:411 +#: 8e64b76bf61e4ef7a3521057cdf7a12e +msgid "18.71" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:212 -#: 2add23066bb544ca801a8e8aaefd21f4 -msgid "31.13" +#: ../../source/benchmark/speed_benchmark.rst:413 +#: adc199b98bf840c6b6ef0fd0740b0c6d +msgid "8.31" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:212 -#: 5a49e504bb8349db8c995ec87262c6df -msgid "14.31" +#: ../../source/benchmark/speed_benchmark.rst:413 +#: 6a7087f37f914919bd19117a2e04c592 +msgid "18.57" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:214 -#: 3592401b6bfc46f68f11e99174f16644 -msgid "33.34" +#: ../../source/benchmark/speed_benchmark.rst:417 +#: 87cc0727f7ee43efb8cec7de683f6e46 +msgid "14B (vLLM)" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:214 -#: 7e085044411946c5b74535ee688e7578 -msgid "11.40" +#: ../../source/benchmark/speed_benchmark.rst:422 +#: ../../source/benchmark/speed_benchmark.rst:634 +#: 4b4c0bd7f6a942a585380f80d256bc75 +msgid "46.30" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:216 -#: fec42d24f42a41d9873614085b1b4e31 -msgid "30.86" +#: ../../source/benchmark/speed_benchmark.rst:424 +#: 991431ee7fc44fe3bf94cd56c5f4a4f4 +msgid "70.40" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:216 -#: b29634fb88a74077b1f1be29df2a1645 -msgid "11.27" +#: ../../source/benchmark/speed_benchmark.rst:426 +#: 0ec9079a10dc46e6b95ac2296ef9ab0d +msgid "98.02" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:218 -#: 9455c7d10a054c709e2ceabf6e85b819 -msgid "26.63" +#: ../../source/benchmark/speed_benchmark.rst:428 +#: 7731361c04c242b7aeaafc926b6f50d6 +msgid "92.66" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:218 -#: fb7f29c39ecf4f408be4c6fc67debc47 -msgid "27.71" +#: ../../source/benchmark/speed_benchmark.rst:430 +#: b0c3cfc6236c4cf391838ece8a48c0e6 +msgid "43.83" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:220 -#: 65b74ffea4554c749a6a0a3de7f676ab -msgid "24.58" +#: ../../source/benchmark/speed_benchmark.rst:432 +#: 920fdbd27f9a430797005289f667804a +msgid "64.33" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:220 -#: 738cf84c7aa344bf93ee7fadba3f4309 -msgid "21.76" +#: ../../source/benchmark/speed_benchmark.rst:434 +#: b4eb38977b024995a7a831649569532d +msgid "86.10" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:222 -#: 73a7770b142f445eb30cb8382df8ea63 -msgid "25.81" +#: ../../source/benchmark/speed_benchmark.rst:436 +#: 3add69ed82814a6a9fdbfc59deb9eb06 +msgid "83.11" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:222 -#: ca82acc2868442c8bef3498d26b0fb6a -msgid "18.86" +#: ../../source/benchmark/speed_benchmark.rst:438 +#: 011f7d1727634bd5a51339523d7e1e4b +msgid "41.91" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:224 -#: 00333797cbb34fddb0640685c2c18503 -msgid "18.72" +#: ../../source/benchmark/speed_benchmark.rst:440 +#: 2df263f68c7b436cbc81f685ef8590a0 +msgid "59.21" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:226 -#: 0ade2fd6695d4947aa5d9f7ee6691f4d -msgid "17.49" +#: ../../source/benchmark/speed_benchmark.rst:442 +#: 8fd9a18276454cf48e5af21eb63b91f5 +msgid "76.85" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:226 -#: 7dec10fbf5c040628de9df93e9c43799 -msgid "42.62" +#: ../../source/benchmark/speed_benchmark.rst:444 +#: 83b581b750744226b2f42f0ecca15479 +msgid "74.03" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:228 -#: 3f53840a8d974f52a121217b307ac088 -msgid "16.69" +#: ../../source/benchmark/speed_benchmark.rst:446 +#: c68286fe55f84456b87c26df79ba1894 +msgid "37.18" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:228 -#: 8aa4b800aadf451ab3a9a8fb57897120 -msgid "36.67" +#: ../../source/benchmark/speed_benchmark.rst:448 +#: 83a7ac03752a402bbdf2bc83fe69d054 +msgid "49.23" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:230 -#: 0e67e5cc63804e18aceaeb5ada3a94f5 -msgid "17.17" +#: ../../source/benchmark/speed_benchmark.rst:450 +#: 4576575e5f394ae0bc98007fd430bffe +msgid "60.91" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:230 -#: db55bfed060249c284b3b0136de467dc -msgid "33.76" +#: ../../source/benchmark/speed_benchmark.rst:452 +#: 869122e329704fa294c32d1d689b17d4 +msgid "59.01" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:232 -#: 7fe02e10cb3c4d23a8c031e0428e133c -msgid "17.87" +#: ../../source/benchmark/speed_benchmark.rst:454 +#: 3d200274a72c4ae4b220e45156d40e40 +msgid "26.85" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:232 -#: b5692baa3c674fca950de5a1b9f0cf9e -msgid "33.63" +#: ../../source/benchmark/speed_benchmark.rst:456 +#: c8602005888643a9816df3d0d3f6a7f5 +msgid "32.83" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:236 -#: 6e13a225b07b4757a3bf80d9aa33cc2c -msgid "7B (vLLM)" +#: ../../source/benchmark/speed_benchmark.rst:458 +#: b1d59857a7d94ffabfa028fee7247413 +msgid "37.67" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:241 -#: 57234c3af122433396e3e87d36299bcb -msgid "80.45" +#: ../../source/benchmark/speed_benchmark.rst:460 +#: 79106824e09043d7a04261798428aa65 +msgid "36.71" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:243 -#: c5ef1e109ba5454bb955d15d7ce21818 -msgid "114.32" +#: ../../source/benchmark/speed_benchmark.rst:462 +#: 4c82b240c59c40c0bce5e21ce8095eee +msgid "14.53" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:245 -#: 3df1a4c3f1c44f839d10d7ac7116a2a8 -msgid "143.40" +#: ../../source/benchmark/speed_benchmark.rst:464 +#: 7c1dcf878d8542e9b0ec91273f74de20 +msgid "15.10" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:247 -#: 3d59b820531d4ac59651d0b8b18b19fb -msgid "96.65" +#: ../../source/benchmark/speed_benchmark.rst:466 +#: 4d873b6a2d4448069f38d9f87857a3af +msgid "15.13" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:249 -#: 89b50d236ae14900bdeae0d1886c3f00 -msgid "76.41" +#: ../../source/benchmark/speed_benchmark.rst:468 +#: d7815412d81b4d858da6c984d96b4621 +msgid "15.25" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:251 -#: 60ec294ddffa41fdb80bee7d408fe785 -msgid "107.02" +#: ../../source/benchmark/speed_benchmark.rst:476 +#: 7b6612a777c044f084bd33c1dc385fdc +msgid "32B (Transformer)" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:253 -#: 0c0b7dbd9e1a4e6b94f5fc7f0ddaad46 -msgid "131.55" +#: ../../source/benchmark/speed_benchmark.rst:481 +#: ../../source/benchmark/speed_benchmark.rst:523 +#: 1f29ef2c5a544cb39638becddcfdcf49 +msgid "Qwen2.5-32B-Instruct" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:255 -#: 214fdc0aeede43a19e44a01625f2776b -msgid "91.38" +#: ../../source/benchmark/speed_benchmark.rst:481 +#: 53d8e92e364c4649bdd593b14d83e100 +msgid "17.54" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:257 -#: 8be1d5ce382b44a6812db41b83a93747 -msgid "66.54" +#: ../../source/benchmark/speed_benchmark.rst:481 +#: e553d565e5b140f385b495303019c033 +msgid "61.58" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:259 -#: e378d26b45584f00867b6aff18a201e1 -msgid "89.72" +#: ../../source/benchmark/speed_benchmark.rst:483 +#: 07025cd0b8394353b5a33f848fcbacca +msgid "14.52" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:261 -#: f71fd9f2f0774608bf4ce9c1e91b9de6 -msgid "97.93" +#: ../../source/benchmark/speed_benchmark.rst:483 +#: f6ece8fde194423b9659c7473c1b0a18 +msgid "33.56" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:263 -#: 021c3f5aa7ac48ee9ae264aa74eba998 -msgid "76.87" +#: ../../source/benchmark/speed_benchmark.rst:485 +#: 665142835b2a4b6b8f74b32474c15856 +msgid "19.20" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:265 -#: b27fd4caaacd4cebb883d46d69213898 -msgid "55.83" +#: ../../source/benchmark/speed_benchmark.rst:485 +#: cffbc93386f9488dba018b96ecc5581c +msgid "18.94" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:267 -#: 8cdb1a3217bf404399522fd20eb7c4d8 -msgid "71.58" +#: ../../source/benchmark/speed_benchmark.rst:487 +#: e5a38d880f6745159a518853d68abb93 +msgid "14.60" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:269 -#: ee2c11a873f8417b8e87a646254ae81b -msgid "81.48" +#: ../../source/benchmark/speed_benchmark.rst:487 +#: 07bf7781c6e946e2b1cda10f9a7aabf5 +msgid "18.67" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:271 -#: 6dd766b8464449238a924226065364ea -msgid "63.62" +#: ../../source/benchmark/speed_benchmark.rst:489 +#: a08a0f85da8a486f86aa44ef1d283835 +msgid "12.49" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:273 -#: ../../source/benchmark/speed_benchmark.rst:435 -#: f02b84950b1045b3962f30a0898657e6 -msgid "63488" +#: ../../source/benchmark/speed_benchmark.rst:489 +#: 83fa297e1f9a4532842be69de2b0fe13 +msgid "63.72" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:273 -#: 48ea75f36ff24d8f8d1c276a3020bc8d -msgid "41.20" +#: ../../source/benchmark/speed_benchmark.rst:491 +#: e4affb2a31894be09864f4b3964fe1db +msgid "11.61" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:275 -#: 849a3504e01d406fbe7ebe5c1df7f7f5 -msgid "49.37" +#: ../../source/benchmark/speed_benchmark.rst:491 +#: 51490a4b4185419f81e4e04e17593d7f +msgid "35.86" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:277 -#: 0ed4057d80274bad9344c263ffab2287 -msgid "54.12" +#: ../../source/benchmark/speed_benchmark.rst:493 +#: b8c6f8f855c94d9aae9b499ab717d079 +msgid "13.42" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:279 -#: a6b5c183234541a5ad3d0a4477bb63fd -msgid "45.89" +#: ../../source/benchmark/speed_benchmark.rst:493 +#: 53c8e2e22cd34cbc9fa7c39806cb0ac3 +msgid "21.09" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:281 -#: ../../source/benchmark/speed_benchmark.rst:443 -#: d4b4a84209274040a7285fc33bf13e93 -msgid "129024" +#: ../../source/benchmark/speed_benchmark.rst:495 +#: 8ef02e5f56a74fd690dfd100b3d3ce21 +msgid "20.81" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:281 -#: d8feeaa0ce814250bfe7706195f5e51c -msgid "25.01" +#: ../../source/benchmark/speed_benchmark.rst:497 +#: fde2d9af0fc942e59415a2e773262689 +msgid "8.95" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:283 -#: ../../source/benchmark/speed_benchmark.rst:399 -#: eb9cc19c7743488a83b40c84b3edfabc -msgid "27.73" +#: ../../source/benchmark/speed_benchmark.rst:497 +#: 332e6ffd84104fbdb0263a32e49ba96d +msgid "67.31" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:285 -#: ae31f01fe1764ce2ae730eaaf1a20a1a -msgid "29.39" +#: ../../source/benchmark/speed_benchmark.rst:499 +#: cbc68892cd7c43a7b7b3b66b974b380b +msgid "8.53" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:287 -#: aab6112852674b11b9f30d0fbdde451e -msgid "27.13" +#: ../../source/benchmark/speed_benchmark.rst:499 +#: 58736c51fa504f299e8e3b57431784b5 +msgid "39.28" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:291 -#: 7ecf74aaa7344de2b10d3e451117ca3a -msgid "57B-A14B (Transformer)" +#: ../../source/benchmark/speed_benchmark.rst:501 +#: cd0d026656d4409e9d2bb254d905897e +msgid "9.48" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:296 -#: ../../source/benchmark/speed_benchmark.rst:318 -#: ../../source/benchmark/speed_benchmark.rst:334 -#: ../../source/benchmark/speed_benchmark.rst:338 -#: f6db37832efd438894e5a7520268a63c -msgid "Qwen2-57B-A14B-Instruct" +#: ../../source/benchmark/speed_benchmark.rst:501 +#: 3ee024efbab945de80cabb50963966a9 +msgid "24.67" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:296 -#: ../../source/benchmark/speed_benchmark.rst:300 -#: ../../source/benchmark/speed_benchmark.rst:304 -#: ../../source/benchmark/speed_benchmark.rst:308 -#: ../../source/benchmark/speed_benchmark.rst:318 -#: ../../source/benchmark/speed_benchmark.rst:320 -#: ../../source/benchmark/speed_benchmark.rst:322 -#: ../../source/benchmark/speed_benchmark.rst:324 -#: ../../source/benchmark/speed_benchmark.rst:350 -#: ../../source/benchmark/speed_benchmark.rst:352 -#: ../../source/benchmark/speed_benchmark.rst:358 -#: ../../source/benchmark/speed_benchmark.rst:360 -#: ../../source/benchmark/speed_benchmark.rst:368 -#: ../../source/benchmark/speed_benchmark.rst:376 -#: ../../source/benchmark/speed_benchmark.rst:378 -#: ../../source/benchmark/speed_benchmark.rst:380 -#: ../../source/benchmark/speed_benchmark.rst:389 -#: ../../source/benchmark/speed_benchmark.rst:393 -#: ../../source/benchmark/speed_benchmark.rst:397 -#: ../../source/benchmark/speed_benchmark.rst:399 -#: ../../source/benchmark/speed_benchmark.rst:403 -#: ../../source/benchmark/speed_benchmark.rst:407 -#: ../../source/benchmark/speed_benchmark.rst:409 -#: ../../source/benchmark/speed_benchmark.rst:413 -#: ../../source/benchmark/speed_benchmark.rst:415 -#: ../../source/benchmark/speed_benchmark.rst:417 -#: ../../source/benchmark/speed_benchmark.rst:421 -#: ../../source/benchmark/speed_benchmark.rst:423 -#: ../../source/benchmark/speed_benchmark.rst:425 -#: ../../source/benchmark/speed_benchmark.rst:429 -#: ../../source/benchmark/speed_benchmark.rst:431 -#: ../../source/benchmark/speed_benchmark.rst:433 -#: ../../source/benchmark/speed_benchmark.rst:437 -#: ../../source/benchmark/speed_benchmark.rst:439 -#: ../../source/benchmark/speed_benchmark.rst:441 -#: ../../source/benchmark/speed_benchmark.rst:447 -#: ../../source/benchmark/speed_benchmark.rst:449 -#: 28f5671ff9f0473bad8ecb1dc85fd4fd -msgid "2" +#: ../../source/benchmark/speed_benchmark.rst:503 +#: 7220fa270f3b44abb519241cfb34837d +msgid "9.71" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:296 -#: cbe6a8135d7a499baf156fd40345740b -msgid "4.76" +#: ../../source/benchmark/speed_benchmark.rst:503 +#: 510e3a1a9e8c445eb3dbb3efaf9703df +msgid "24.39" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:296 -#: ab154c4808ab43ddb8d3fefb070b94d8 -msgid "110.29" +#: ../../source/benchmark/speed_benchmark.rst:505 +#: f28eeee663574673970f898def69eda5 +msgid "5.59" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:298 -#: 8224ca1206f64485bea889415b74ffcb -msgid "5.55" +#: ../../source/benchmark/speed_benchmark.rst:505 +#: 6afb3d7c36e34966b50f2d5896a30f13 +msgid "74.47" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:298 -#: f864b9292afa43d09b82cd8eaae4a7e8 -msgid "30.38" +#: ../../source/benchmark/speed_benchmark.rst:507 +#: 6dfcb55205f14066946e98e5f21ebbb4 +msgid "5.42" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:300 -#: fb809e7ceace400b8a082acac8fd4c61 -msgid "4.90" +#: ../../source/benchmark/speed_benchmark.rst:507 +#: f5e851a787604e9eb23881ae7a0823cf +msgid "46.45" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:300 -#: 56194000744746b4a1e686600caf901c -msgid "117.80" +#: ../../source/benchmark/speed_benchmark.rst:509 +#: 3be974dc1add4a5f841c1a6615d37f55 +msgid "5.79" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:302 -#: 42beaf246b97438b8ee652873b047c52 -msgid "5.44" +#: ../../source/benchmark/speed_benchmark.rst:509 +#: 0e8ee0a620134e9cad5a123e1f262a4e +msgid "31.84" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:302 -#: c27750f759f24f85bef8955a43d19a98 -msgid "35.67" +#: ../../source/benchmark/speed_benchmark.rst:511 +#: 96c5762c06af49c28c04c3dd86ab529e +msgid "5.85" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:304 -#: 292a44b1c4f84f538a4681a1c9370a65 -msgid "4.58" +#: ../../source/benchmark/speed_benchmark.rst:511 +#: ea0189c9871b493ea1e2fe8c76b8bf9c +msgid "31.56" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:304 -#: 082119c4a2cf4f77a403a48e45bab6f1 -msgid "128.17" +#: ../../source/benchmark/speed_benchmark.rst:518 +#: ad539924324348b79ae510fc747d0503 +msgid "32B (vLLM)" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:306 -#: 64b7ab50e0324dd4b28a8e13165dcd50 -msgid "5.31" +#: ../../source/benchmark/speed_benchmark.rst:523 +#: 18fbf0142b484b39bd993a5e2b7109f3 +msgid "22.13" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:306 -#: f567c71bdba14d1fbf5d6c0b9fb9b4fa -msgid "43.11" +#: ../../source/benchmark/speed_benchmark.rst:523 +#: ../../source/benchmark/speed_benchmark.rst:531 +#: ../../source/benchmark/speed_benchmark.rst:539 +#: ecb82e57e6174a228b0bb6c9408c857a +msgid "setting1" +msgstr "[设定3]" + +#: ../../source/benchmark/speed_benchmark.rst:525 +#: cbfd13d749b34ae8b58d71862a01a082 +msgid "37.57" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:308 -#: ../../source/benchmark/speed_benchmark.rst:366 -#: 46bce29cddac48c9ab1c86a77f39b99c -msgid "4.12" +#: ../../source/benchmark/speed_benchmark.rst:527 +#: dd354dc40c2b44e08ce16795be4d374d +msgid "55.83" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:308 -#: 28b0a74ed44a4d4bb2848643d0cedb61 -msgid "163.77" +#: ../../source/benchmark/speed_benchmark.rst:529 +#: e105e841443e475e82c1c979d5abd8da +msgid "51.92" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:310 -#: b06bf0b700294d86b5e0b83fff8b5cdc -msgid "4.72" +#: ../../source/benchmark/speed_benchmark.rst:531 +#: 2c5ef86b9f184afd91b9c0813f0d2d9f +msgid "21.05" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:310 -#: ef78f3c059084e2c851ccc57bef3fa60 -msgid "58.01" +#: ../../source/benchmark/speed_benchmark.rst:533 +#: 8f1b7ee4927644218088c54cf274f304 +msgid "34.67" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:313 -#: a25143ba4b5b4e17a2c24ef133f808ba -msgid "57B-A14B (vLLM)" +#: ../../source/benchmark/speed_benchmark.rst:535 +#: 8970b876b77d43eea85a9999d7c0ec18 +msgid "49.96" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:318 -#: 0c60acb691614f19a388b98b290c5380 -msgid "31.44" +#: ../../source/benchmark/speed_benchmark.rst:537 +#: 02d6ad974d384e42be987c076b6eb816 +msgid "46.68" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:320 -#: 2e50af3e813c47d28b950d78c92b06d6 -msgid "31.77" +#: ../../source/benchmark/speed_benchmark.rst:539 +#: 0437b0be0e354954a91507554f9ac9bd +msgid "19.91" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:322 -#: 670e9d878d6c4ff78cdf4f890c4fb163 -msgid "21.25" +#: ../../source/benchmark/speed_benchmark.rst:541 +#: d48dd2bea93e4b119300d5cbf9f3817c +msgid "31.89" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:324 -#: 05c9e0c8cf4d4eb1b7c896c748b8e40a -msgid "20.24" +#: ../../source/benchmark/speed_benchmark.rst:543 +#: 76a8cb67eb034e71b7cea12c30dcec9a +msgid "44.79" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:327 -#: 3afade020cfb40c3ac2521327ceb9278 -msgid "Note: Compared with dense models, MOE models have larger throughput when batch size is large, which is shown as follows:" -msgstr "混合专家模型 (Mixture-of-Experts, MoE) 与稠密模型相比,当批大小较大时,吞吐量更大。下表展示了有关数据:" +#: ../../source/benchmark/speed_benchmark.rst:545 +#: 5eeed726d7584af382aef9f9b60f2568 +msgid "41.83" +msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:330 -#: 2e819ceb9e43455ab51c447ce96a3d56 -msgid "# Prompts" -msgstr "请求数" +#: ../../source/benchmark/speed_benchmark.rst:547 +#: ../../source/benchmark/speed_benchmark.rst:555 +#: ../../source/benchmark/speed_benchmark.rst:563 +#: ../../source/benchmark/speed_benchmark.rst:585 +#: ../../source/benchmark/speed_benchmark.rst:587 +#: ../../source/benchmark/speed_benchmark.rst:593 +#: ../../source/benchmark/speed_benchmark.rst:595 +#: ../../source/benchmark/speed_benchmark.rst:603 +#: ../../source/benchmark/speed_benchmark.rst:611 +#: ../../source/benchmark/speed_benchmark.rst:613 +#: ../../source/benchmark/speed_benchmark.rst:615 +#: ../../source/benchmark/speed_benchmark.rst:626 +#: ../../source/benchmark/speed_benchmark.rst:630 +#: ../../source/benchmark/speed_benchmark.rst:634 +#: ../../source/benchmark/speed_benchmark.rst:636 +#: ../../source/benchmark/speed_benchmark.rst:640 +#: ../../source/benchmark/speed_benchmark.rst:644 +#: ../../source/benchmark/speed_benchmark.rst:646 +#: ../../source/benchmark/speed_benchmark.rst:650 +#: ../../source/benchmark/speed_benchmark.rst:652 +#: ../../source/benchmark/speed_benchmark.rst:654 +#: ../../source/benchmark/speed_benchmark.rst:658 +#: ../../source/benchmark/speed_benchmark.rst:660 +#: ../../source/benchmark/speed_benchmark.rst:662 +#: ../../source/benchmark/speed_benchmark.rst:666 +#: ../../source/benchmark/speed_benchmark.rst:668 +#: ../../source/benchmark/speed_benchmark.rst:670 +#: ../../source/benchmark/speed_benchmark.rst:676 +#: ../../source/benchmark/speed_benchmark.rst:678 +#: 1de1b78c409742d48003dd118a0bd0b6 +msgid "2" +msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:330 -#: 4e979ae74bab41e38717901f7f96171c -msgid "QPS" -msgstr "请求每秒 (QPS)" +#: ../../source/benchmark/speed_benchmark.rst:547 +#: c90999f095754750947c1f9754fec8b7 +msgid "31.82" +msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:330 -#: c0c85f59ac994011a81dfcc84f7ecdae -msgid "Tokens/s" -msgstr "速度 (tokens/s)" +#: ../../source/benchmark/speed_benchmark.rst:549 +#: bffb7c59dfd047a7bb148c720dfc3559 +msgid "26.88" +msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:332 -#: ../../source/benchmark/speed_benchmark.rst:336 -#: 1776dcc8818644d58d5933f9933e4e4e -msgid "Qwen1.5-32B-Chat" +#: ../../source/benchmark/speed_benchmark.rst:551 +#: 05715030d07b43be973f53a11c340488 +msgid "35.66" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:332 -#: ../../source/benchmark/speed_benchmark.rst:334 -#: 0952bdd1354240ee9d60e7925c4ea42b -msgid "100" +#: ../../source/benchmark/speed_benchmark.rst:553 +#: d64faf27fa474f658cdb17f96a11c176 +msgid "33.75" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:332 -#: 6eae8bcf6ff946b8bea2957b3ded71be -msgid "6.68" +#: ../../source/benchmark/speed_benchmark.rst:555 +#: 8ee73b440af9496e99d2c9fa8c3aab47 +msgid "24.45" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:332 -#: 53f595571768489ba575c8b12fef3ec3 -msgid "7343.56" +#: ../../source/benchmark/speed_benchmark.rst:557 +#: 8232d17592dc40879634997bb2ec8a62 +msgid "18.60" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:334 -#: f9f2397a1baf4e09814a418c5f61a592 -msgid "4.81" +#: ../../source/benchmark/speed_benchmark.rst:559 +#: 957977d138a14fa49556b6d85ae09f98 +msgid "22.72" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:334 -#: d3cc853999fa404bbcb2f40d9ba74487 -msgid "5291.15" +#: ../../source/benchmark/speed_benchmark.rst:561 +#: 8db19c15901344f193ffe475e337ce6d +msgid "21.79" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:336 -#: ../../source/benchmark/speed_benchmark.rst:338 -#: c2e443593c0a4c4c8805450f7f4437c0 -msgid "1000" +#: ../../source/benchmark/speed_benchmark.rst:563 +#: 78f2313954304662b22171c161b2545b +msgid "14.31" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:336 -#: 3cc76ffdf7e849d0a4d8dc36eb654fdf -msgid "7.99" +#: ../../source/benchmark/speed_benchmark.rst:565 +#: 0b3597500eb04bee9b87863dd9a7f5cf +msgid "9.77" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:336 -#: 8683668f120647d8a234357dfab12b47 -msgid "8791.35" +#: ../../source/benchmark/speed_benchmark.rst:567 +#: 9da2d0f4f4674cf78f41c29f0353e94f +msgid "10.39" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:338 -#: 4a449baa6803435ba984163e7663cc5a -msgid "5.18" +#: ../../source/benchmark/speed_benchmark.rst:569 +#: 0ab3a055fbb44fc2816bc11cd0bf334c +msgid "10.34" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:338 -#: 6a62898ae87f4b3ea31c871f5eacb53a -msgid "5698.37" +#: ../../source/benchmark/speed_benchmark.rst:572 +#: 7ec41a4f21564751a5e286349bd2973f +msgid "For context length 129024, the model needs to be predicted with the following config: \"model_max_length\"=131072" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:341 -#: 092e36507d1d48638cd439cefe2a9e77 -msgid "The results are obtained from vLLM throughput benchmarking scripts, which can be reproduced by:" -msgstr "数据由vLLM吞吐量测试脚本测得,可通过以下命令复现" +#: ../../source/benchmark/speed_benchmark.rst:573 +#: ../../source/benchmark/speed_benchmark.rst:681 +#: 6bcfd786e43e47fcbe60a1636a980420 +msgid "[Default Setting]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)" +msgstr "[默认设定]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)" -#: ../../source/benchmark/speed_benchmark.rst:343 -#: ffdbcf34e9d64bfbac5cf88eef55cf76 -msgid "``python vllm/benchmarks/benchmark_throughput.py --input-len 1000 --output-len 100 --model --num-prompts --enforce-eager -tp 2``" -msgstr "" +#: ../../source/benchmark/speed_benchmark.rst:574 +#: 52ff1ba434ec4f5aa779f7c1391a4888 +msgid "[Setting 1]=(gpu_memory_utilization=1.0 max_model_len=32768 enforce_eager=True)" +msgstr "[设定 3]=(gpu_memory_utilization=1.0 max_model_len=8192 enforce_eager=True)" -#: ../../source/benchmark/speed_benchmark.rst:345 -#: 4f799c9a507b45f0a6eeed9887fce4a9 +#: ../../source/benchmark/speed_benchmark.rst:580 +#: 35f9135bf6704976b2a4a2d1cbef1e42 msgid "72B (Transformer)" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:350 -#: ../../source/benchmark/speed_benchmark.rst:389 -#: d179bb53b2774464b254c8fc169c9125 -msgid "Qwen2-72B-Instruct" -msgstr "" - -#: ../../source/benchmark/speed_benchmark.rst:350 -#: c60dad5fafab4a0ab8409ff5993fc81c -msgid "7.45" +#: ../../source/benchmark/speed_benchmark.rst:585 +#: ../../source/benchmark/speed_benchmark.rst:626 +#: 1ec160dc664d4ea48527cb2ee63842ae +msgid "Qwen2.5-72B-Instruct" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:350 -#: f3bb0a3fd6874bc39ae8bf32e50c5708 -msgid "134.74" +#: ../../source/benchmark/speed_benchmark.rst:585 +#: 0cd29021195b4deda82222fd1c5b2d89 +msgid "8.73" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:352 -#: 3a199090c9d64c259a408eaeb48d481d -msgid "7.30" +#: ../../source/benchmark/speed_benchmark.rst:585 +#: 4759248c3efe417cbfab580cc75d6e9b +msgid "136.20" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:352 -#: c0ce906ec79347419857b7018238cdcb -msgid "71.00" +#: ../../source/benchmark/speed_benchmark.rst:587 +#: 4e5831dfca684eba87e0d9bb1e057f89 +msgid "8.66" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:354 -#: 4e60bd8a343a413080a258f4e3972257 -msgid "9.05" +#: ../../source/benchmark/speed_benchmark.rst:587 +#: 8191363491464f02952e9df810b72dad +msgid "72.61" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:354 -#: 95fe8e443bb34e579229392653e8f2f2 -msgid "41.80" +#: ../../source/benchmark/speed_benchmark.rst:589 +#: ee80f21a647b48758fa4cb6e486345df +msgid "11.07" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:356 -#: e3f3148f13334ec18395e17ee0b22e1f -msgid "9.96" +#: ../../source/benchmark/speed_benchmark.rst:589 +#: 21fa5feb12eb46ec89b08f36aecc5f64 +msgid "39.91" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:356 -#: b4fb66e12fe54fefb762c7cb37beb7b2 -msgid "41.31" +#: ../../source/benchmark/speed_benchmark.rst:591 +#: eae85aa1efdd4ed189e84e5bacb0c943 +msgid "11.50" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:358 -#: 51d1ef53b0894cbda6f5876ab5f2c26b -msgid "5.99" +#: ../../source/benchmark/speed_benchmark.rst:591 +#: 8c37bed4817049b4bbf221e225da2c02 +msgid "39.44" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:358 -#: 5e2eed3a83fc4f468d2c8cd2e8e8ed2c -msgid "144.38" +#: ../../source/benchmark/speed_benchmark.rst:593 +#: 318376c8783f447092b4da185e8d6f55 +msgid "140.00" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:360 -#: 6db8fd30a9ca466b847825fdcd5cc37d -msgid "80.60" +#: ../../source/benchmark/speed_benchmark.rst:595 +#: 6ae597ed16c04855858e32a0f81a3318 +msgid "77.81" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:362 -#: 64b0147addbe404693cb0edb5b8cf69b -msgid "6.79" +#: ../../source/benchmark/speed_benchmark.rst:597 +#: 8512820f22034145bab9b4209b052870 +msgid "7.56" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:362 -#: 703be6e494414d6d980c4f4af7973c65 -msgid "47.90" +#: ../../source/benchmark/speed_benchmark.rst:597 +#: ../../source/benchmark/speed_benchmark.rst:644 +#: 1189fab08ecf47788307b9fcd311d93f +msgid "42.50" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:364 -#: f77a7042530646d0bbd6596dc3b04e05 -msgid "7.49" +#: ../../source/benchmark/speed_benchmark.rst:599 +#: 4a920e9f493b4fe99016d18f1c7a335a +msgid "8.17" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:364 -#: f547fb57b33f48f1a77dc065110be0d3 -msgid "47.42" +#: ../../source/benchmark/speed_benchmark.rst:599 +#: 0b23f7fda490423bab9b2710090ba354 +msgid "42.13" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:366 -#: ../../source/benchmark/speed_benchmark.rst:374 -#: 3089d24bda534c67b5be05e281ad64e3 +#: ../../source/benchmark/speed_benchmark.rst:601 +#: ../../source/benchmark/speed_benchmark.rst:609 +#: 64a5def0a6ac4f68b05c68a21d7e7b73 msgid "3" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:366 -#: ffa5f85189ff4b128cc169437949572f -msgid "169.93" +#: ../../source/benchmark/speed_benchmark.rst:601 +#: 3ba1fb9c27dc43948e863b2eb1373e84 +msgid "4.25" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:368 -#: e9903e0519a24d94b06fd64480743688 -msgid "4.43" +#: ../../source/benchmark/speed_benchmark.rst:601 +#: d934eb7c7fd5409ea165c5401ddc40e3 +msgid "149.14" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:368 -#: 154c3b57143d4b92b5ee1c95e636ffe7 -msgid "95.14" +#: ../../source/benchmark/speed_benchmark.rst:603 +#: 5ec8fd8fdce54044bac0d826833bb3c3 +msgid "4.66" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:370 -#: 19c10d1f9e284e18819df8e2180c4590 -msgid "4.87" +#: ../../source/benchmark/speed_benchmark.rst:603 +#: 3a47334ad32942cab67ca6e033d64d33 +msgid "82.55" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:370 -#: 85574d14ce2d43748b53c5ab5cc3a05b -msgid "57.79" +#: ../../source/benchmark/speed_benchmark.rst:605 +#: d4e6a932094f45329dc9451c9cdbf5ce +msgid "5.27" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:372 -#: 0d987bcce064487681d5cc4e6affd306 -msgid "5.23" +#: ../../source/benchmark/speed_benchmark.rst:605 +#: 16b4a09390ad433982e7cce6d772ab67 +msgid "46.86" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:372 -#: a49809ae1c804efc82256b58e1c8fa5a -msgid "57.30" +#: ../../source/benchmark/speed_benchmark.rst:607 +#: 7290740f09ed4f5e8056bd5a73919eb1 +msgid "5.57" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:374 -#: d5e76c1fdbbc461996421d71b7a864db -msgid "2.86" +#: ../../source/benchmark/speed_benchmark.rst:607 +#: bb2c46fd7b4b4d9b9eee27796228d541 +msgid "46.38" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:374 -#: b4b313dbb165470ab875032d569fa2c6 -msgid "209.03" +#: ../../source/benchmark/speed_benchmark.rst:609 +#: ../../source/benchmark/speed_benchmark.rst:611 +#: d73af4dd10734038b7ebf958929ddb3c +msgid "2.94" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:376 -#: 254f53b31b6141478bf4c914cd3303c0 -msgid "2.83" +#: ../../source/benchmark/speed_benchmark.rst:609 +#: 314958f218a144f6bae0e7b7c666a87b +msgid "164.79" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:376 -#: 8c14e41ad0cc4567ba1a1bc4a5e55194 -msgid "124.20" +#: ../../source/benchmark/speed_benchmark.rst:611 +#: b9669600e20449f19330a26d4ae24d8b +msgid "94.75" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:378 -#: 7338224b41e04ba988aee3b919a63c19 -msgid "3.02" +#: ../../source/benchmark/speed_benchmark.rst:613 +#: bab0c51f5f434b9daa149dd8a2dbf518 +msgid "3.14" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:378 -#: da4e3178c39a4c8284fa63428a298ece -msgid "107.94" +#: ../../source/benchmark/speed_benchmark.rst:613 +#: 21b5b4b6e2774b4bbf7a0161a1315215 +msgid "62.57" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:380 -#: d0c944ecc68d4c79a1cabc605aafdd3b -msgid "1.85" +#: ../../source/benchmark/speed_benchmark.rst:615 +#: 71d2e7a03a7648ec9b3c824d1b3e6be6 +msgid "3.23" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:380 -#: 83404c21d99d47568ada6ca97e4dc08c -msgid "88.60" +#: ../../source/benchmark/speed_benchmark.rst:615 +#: 8ddc63ac4ac3441d82ba48a76a168857 +msgid "61.64" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:384 -#: 01710f215568440cb8193443cb4d2d11 +#: ../../source/benchmark/speed_benchmark.rst:621 +#: 1fa49ab15da7411f9e2288fa92b39adc msgid "72B (vLLM)" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:387 -#: b426a4c94d9e46d0acb50f650c5aba1d -msgid "Setting" +#: ../../source/benchmark/speed_benchmark.rst:626 +#: 4b9dc207673447b69eeb39318d74faff +msgid "18.19" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:389 -#: e69c240d64334418a00fd79fb5374896 -msgid "17.68" -msgstr "" +#: ../../source/benchmark/speed_benchmark.rst:626 +#: 8cdd46c8c8c64b749e734e82f604941d +msgid "Setting 1" +msgstr "[设定3]" -#: ../../source/benchmark/speed_benchmark.rst:389 -#: 54209a9ed7064ea5999b591f3b581c85 -msgid "[Setting 1]" +#: ../../source/benchmark/speed_benchmark.rst:628 +#: ../../source/benchmark/speed_benchmark.rst:638 +#: ../../source/benchmark/speed_benchmark.rst:648 +#: ../../source/benchmark/speed_benchmark.rst:656 +#: ../../source/benchmark/speed_benchmark.rst:664 +#: ../../source/benchmark/speed_benchmark.rst:672 +#: ../../source/benchmark/speed_benchmark.rst:674 +#: 6cb2bb1f8236463b82980661f2b26187 +msgid "4" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:391 -#: ../../source/benchmark/speed_benchmark.rst:401 -#: ../../source/benchmark/speed_benchmark.rst:411 -#: ../../source/benchmark/speed_benchmark.rst:419 -#: ../../source/benchmark/speed_benchmark.rst:427 -#: ../../source/benchmark/speed_benchmark.rst:435 -#: ../../source/benchmark/speed_benchmark.rst:443 -#: ../../source/benchmark/speed_benchmark.rst:445 -#: 44fd7b7e19aa411785e2db8e02443b34 -msgid "4" +#: ../../source/benchmark/speed_benchmark.rst:628 +#: dbe95233ca094aad9620b1c5a2392f18 +msgid "31.37" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:391 -#: 8b7ff0d4a4eb4be1b88d5b41eff75c8f -msgid "30.01" +#: ../../source/benchmark/speed_benchmark.rst:628 +#: ../../source/benchmark/speed_benchmark.rst:630 +#: ../../source/benchmark/speed_benchmark.rst:632 +#: ../../source/benchmark/speed_benchmark.rst:636 +#: ../../source/benchmark/speed_benchmark.rst:638 +#: ../../source/benchmark/speed_benchmark.rst:640 +#: ../../source/benchmark/speed_benchmark.rst:642 +#: ../../source/benchmark/speed_benchmark.rst:646 +#: ../../source/benchmark/speed_benchmark.rst:648 +#: ../../source/benchmark/speed_benchmark.rst:650 +#: ../../source/benchmark/speed_benchmark.rst:652 +#: ../../source/benchmark/speed_benchmark.rst:654 +#: ../../source/benchmark/speed_benchmark.rst:656 +#: ../../source/benchmark/speed_benchmark.rst:658 +#: ../../source/benchmark/speed_benchmark.rst:660 +#: ../../source/benchmark/speed_benchmark.rst:662 +#: ee5778f9148049669096d6426455133f +msgid "Default" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:393 -#: 975913f1bbec4e149dcc576aba012b16 -msgid "27.56" +#: ../../source/benchmark/speed_benchmark.rst:630 +#: 6f55f6473c2c41a580d423528bac5899 +msgid "31.40" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:395 -#: 4a055963a1eb4098aaed27e3b21b71b2 -msgid "29.60" +#: ../../source/benchmark/speed_benchmark.rst:632 +#: 074e04d74f81418bb8900529ee718191 +msgid "16.47" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:395 -#: 74c3cf2b89f5438ab0f20f7d36a010b2 -msgid "[Setting 2]" +#: ../../source/benchmark/speed_benchmark.rst:634 +#: 06343656f08748a2b10154bae4376207 +msgid "Setting 2" msgstr "[设定2]" -#: ../../source/benchmark/speed_benchmark.rst:397 -#: 03bfb2c5a69b4d3482518577c32fa39a -msgid "42.82" +#: ../../source/benchmark/speed_benchmark.rst:636 +#: 366ca98c00a04cfeb8a0284291994bbf +msgid "44.30" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:401 -#: 3b8fe08697904a03bd49f267e9a8c223 -msgid "27.98" +#: ../../source/benchmark/speed_benchmark.rst:638 +#: f43dfd3b925d4985a3efdf2e7da4ea29 +msgid "29.90" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:403 -#: dcc68256009b4048aa3bc8a1d9fa326b -msgid "25.46" +#: ../../source/benchmark/speed_benchmark.rst:640 +#: 64d5ac1585ac4bf4a7b862bae489164a +msgid "29.37" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:405 -#: bca0bc1c87c5451fb0eaffc5abe9ceff -msgid "25.16" +#: ../../source/benchmark/speed_benchmark.rst:642 +#: 1de86b60d4c24dcca314e4818e07db24 +msgid "13.88" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:405 -#: 650de88091f441869c90042ec8034a97 -msgid "[Setting 3]" +#: ../../source/benchmark/speed_benchmark.rst:644 +#: b204f3c52579498683f909bc570a588a +msgid "Setting 3" msgstr "[设定3]" -#: ../../source/benchmark/speed_benchmark.rst:407 -#: 3c0e2a52803d4d22af83fb75abbf5233 -msgid "38.23" +#: ../../source/benchmark/speed_benchmark.rst:646 +#: c12d1b7abd0a464eacfae7d12413f732 +msgid "40.67" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:409 -#: 631ea6009552452b8fa46934e8e0f0d0 -msgid "25.77" +#: ../../source/benchmark/speed_benchmark.rst:648 +#: aa72f60f839e4f008ba928e0c220f10e +msgid "30.10" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:411 -#: 8bf01519422a45d8a2e84533c6a20c06 -msgid "21.81" +#: ../../source/benchmark/speed_benchmark.rst:650 +#: 14a54dd72ce84f08a441f75db838f211 +msgid "27.20" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:413 -#: 0b688d7ae4a6418da298b6f3b12fe4b3 -msgid "22.71" +#: ../../source/benchmark/speed_benchmark.rst:652 +#: 0d272d38659346aa85564e96df220ef6 +msgid "38.10" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:415 -#: 4b7e706fa6314752af02fbf0d293121f -msgid "26.54" +#: ../../source/benchmark/speed_benchmark.rst:654 +#: 9009eaa7925f4389bd0727abc502ad7a +msgid "36.63" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:417 -#: 7c232a8ad11c47afaa0143d9db108b19 -msgid "21.50" +#: ../../source/benchmark/speed_benchmark.rst:656 +#: 9fd7ffbf2ad24498af10753fbde5d7b3 +msgid "27.53" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:419 -#: ../../source/benchmark/speed_benchmark.rst:427 -#: 3aaf3ae8d8714b0e9c9f14031b5d97a9 -msgid "19.43" +#: ../../source/benchmark/speed_benchmark.rst:658 +#: 14c6b382ca474a1fb361af0d5555ab54 +msgid "23.32" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:421 -#: ../../source/benchmark/speed_benchmark.rst:429 -#: 4d8b93b402b94ceba65c5da2383327f6 -msgid "18.69" +#: ../../source/benchmark/speed_benchmark.rst:660 +#: 0a43a0a7e072497eb2ee37babdbe74ba +msgid "30.98" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:423 -#: ../../source/benchmark/speed_benchmark.rst:431 -#: cf8c56e770ac439fb0e668dd1f4bd745 -msgid "23.12" +#: ../../source/benchmark/speed_benchmark.rst:662 +#: 377248d5998644e5a92b276d15b2304b +msgid "30.02" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:425 -#: ../../source/benchmark/speed_benchmark.rst:433 -#: d7b01f09c1d041029296a4d89f0685ae -msgid "18.09" +#: ../../source/benchmark/speed_benchmark.rst:664 +#: 2a7c8f06cdc84e39a27adbf50ec9e264 +msgid "20.74" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:435 -#: 158d7f4cd81b4bcbbcab4f470074c0a4 -msgid "17.46" -msgstr "" +#: ../../source/benchmark/speed_benchmark.rst:664 +#: ../../source/benchmark/speed_benchmark.rst:666 +#: ../../source/benchmark/speed_benchmark.rst:668 +#: ../../source/benchmark/speed_benchmark.rst:670 +#: 48fa6349dfa84ceb9647fe94dfb1b0f1 +msgid "Setting 4" +msgstr "[设定3]" -#: ../../source/benchmark/speed_benchmark.rst:437 -#: 70c34f305d3f4b9d931355d1d48832dd -msgid "15.30" +#: ../../source/benchmark/speed_benchmark.rst:666 +#: 871c974bc23d486c893d3a5609276cf1 +msgid "16.27" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:439 -#: 162b19a8a6ae46129942682f783c3606 -msgid "13.23" +#: ../../source/benchmark/speed_benchmark.rst:668 +#: b8eb841787d74da3942f972917292c67 +msgid "19.84" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:441 -#: 455093999fa9451cae981ff12b01cd50 -msgid "13.14" +#: ../../source/benchmark/speed_benchmark.rst:670 +#: 16a441b4182a4eb1a7d4bce1c5368516 +msgid "19.32" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:443 -#: 8411be189ad24683af4e17fc8f2ac222 -msgid "11.70" +#: ../../source/benchmark/speed_benchmark.rst:672 +#: 9c4b458f6fd24ae8a6e623fac9b35775 +msgid "12.68" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:445 -#: 4e4131989f7f4e57b75a6f14636c3c2c -msgid "12.94" -msgstr "" +#: ../../source/benchmark/speed_benchmark.rst:672 +#: ../../source/benchmark/speed_benchmark.rst:674 +#: ../../source/benchmark/speed_benchmark.rst:676 +#: ../../source/benchmark/speed_benchmark.rst:678 +#: b728aa33bc8a43d48e30bb7c6eda12c3 +msgid "Setting 5" +msgstr "[设定3]" -#: ../../source/benchmark/speed_benchmark.rst:447 -#: 65fd14035c9d401f98f9013ad13e3db3 -msgid "8.33" +#: ../../source/benchmark/speed_benchmark.rst:674 +#: 57c759533ff04e7cb7f6ceb027d80692 +msgid "14.11" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:449 -#: e6065cf3a98d4dc5be1c8ac43c759743 -msgid "7.78" +#: ../../source/benchmark/speed_benchmark.rst:676 +#: 8fdea9ad04aa4608b5f839f3d923205c +msgid "10.11" msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:452 -#: 984f697c77ad416480978c2e703d3281 -msgid "[Default Setting]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)" -msgstr "[默认设定]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)" +#: ../../source/benchmark/speed_benchmark.rst:678 +#: 6a52a106dc724a8285a38b543acda27e +msgid "9.88" +msgstr "" -#: ../../source/benchmark/speed_benchmark.rst:453 -#: 1a3dd3717c5b4dcebbd12bcfe0fbc33e +#: ../../source/benchmark/speed_benchmark.rst:682 +#: 7f20ec685e1b4e0eb0a606369efb844d msgid "[Setting 1]=(gpu_memory_utilization=0.98 max_model_len=4096 enforce_eager=True)" msgstr "[设定 1]=(gpu_memory_utilization=0.98 max_model_len=4096 enforce_eager=True)" -#: ../../source/benchmark/speed_benchmark.rst:454 -#: ff06cfda1923459a9fbc40c68aa8359a +#: ../../source/benchmark/speed_benchmark.rst:683 +#: a750af24c71b4094994ccf9dc8b87986 msgid "[Setting 2]=(gpu_memory_utilization=1.0 max_model_len=4096 enforce_eager=True)" msgstr "[设定 2]=(gpu_memory_utilization=1.0 max_model_len=4096 enforce_eager=True)" -#: ../../source/benchmark/speed_benchmark.rst:455 -#: adb3086724db4afa97ba526f69bd15b3 +#: ../../source/benchmark/speed_benchmark.rst:684 +#: 8fd1c1acd99f4f7eb4766030a56823e5 msgid "[Setting 3]=(gpu_memory_utilization=1.0 max_model_len=8192 enforce_eager=True)" msgstr "[设定 3]=(gpu_memory_utilization=1.0 max_model_len=8192 enforce_eager=True)" +#: ../../source/benchmark/speed_benchmark.rst:685 +#: 70b88bc91b9848bdb4901f8490275db4 +msgid "[Setting 4]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False)" +msgstr "[默认设定]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)" + +#: ../../source/benchmark/speed_benchmark.rst:686 +#: 6eed31e9316b4d64b8c1d105884524fc +msgid "[Setting 5]=(gpu_memory_utilization=0.9 max_model_len=131072 enforce_eager=False)" +msgstr "[默认设定]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False)" diff --git a/docs/source/benchmark/speed_benchmark.rst b/docs/source/benchmark/speed_benchmark.rst index 297cf71..8585362 100644 --- a/docs/source/benchmark/speed_benchmark.rst +++ b/docs/source/benchmark/speed_benchmark.rst @@ -11,21 +11,21 @@ The environment of the evaluation with huggingface transformers is: - NVIDIA A100 80GB - CUDA 12.1 -- torch==2.3.1 -- flash_attn==2.5.8 -- transformers==4.46.0 -- auto_gptq==0.7.1+cu1210 (Compiled from source code) -- autoawq==0.2.6 +- Pytorch 2.3.1 +- Flash Attention 2.5.8 +- Transformers 4.46.0 +- AutoGPTQ 0.7.1+cu121 (Compiled from source code) +- AutoAWQ 0.2.6 The environment of the evaluation with vLLM is: - NVIDIA A100 80GB - CUDA 12.1 -- vllm==0.6.3 -- torch==2.4.0 -- flash_attn==2.6.3 -- transformers==4.46.0 +- vLLM 0.6.3 +- Pytorch 2.4.0 +- Flash Attention 2.6.3 +- Transformers 4.46.0 Notes: @@ -43,42 +43,41 @@ Notes: - 0.5B (Transformer) -+-------------------------+--------------+--------------+---------+-----------------+----------------+ -| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | -+=========================+==============+==============+=========+=================+================+ -| Qwen2.5-0.5B-Instruct | 1 | BF16 | 1 | 47.40 | 0.97 | -+ + +--------------+---------+-----------------+----------------+ -| | | GPTQ-Int8 | 1 | 35.17 | 0.64 | -+ + +--------------+---------+-----------------+----------------+ -| | | GPTQ-Int4 | 1 | 50.60 | 0.48 | -+ + +--------------+---------+-----------------+----------------+ -| | | AWQ | 1 | 37.09 | 0.68 | -+ +--------------+--------------+---------+-----------------+----------------+ -| | 6144 | BF16 | 1 | 47.45 | 1.23 | -+ + +--------------+---------+-----------------+----------------+ -| | | GPTQ-Int8 | 1 | 36.47 | 0.90 | -+ + +--------------+---------+-----------------+----------------+ -| | | GPTQ-Int4 | 1 | 48.89 | 0.73 | -+ + +--------------+---------+-----------------+----------------+ -| | | AWQ | 1 | 37.04 | 0.72 | -+ +--------------+--------------+---------+-----------------+----------------+ -| | 14336 | BF16 | 1 | 47.11 | 1.60 | -+ + +--------------+---------+-----------------+----------------+ -| | | GPTQ-Int8 | 1 | 35.44 | 1.26 | -+ + +--------------+---------+-----------------+----------------+ -| | | GPTQ-Int4 | 1 | 48.26 | 1.10 | -+ + +--------------+---------+-----------------+----------------+ -| | | AWQ | 1 | 37.14 | 1.10 | -+ +--------------+--------------+---------+-----------------+----------------+ -| | 30720 | BF16 | 1 | 47.16 | 2.34 | -+ + +--------------+---------+-----------------+----------------+ -| | | GPTQ-Int8 | 1 | 36.25 | 2.01 | -+ + +--------------+---------+-----------------+----------------+ -| | | GPTQ-Int4 | 1 | 49.22 | 1.85 | -+ + +--------------+---------+-----------------+----------------+ -| | | AWQ | 1 | 36.90 | 1.84 | -+-------------------------+--------------+--------------+---------+-----------------+----------------+ - ++-------------------------+--------------+--------------+---------+-----------------+----------------+---------------------------+ +| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note | ++=========================+==============+==============+=========+=================+================+===========================+ +| Qwen2.5-0.5B-Instruct | 1 | BF16 | 1 | 47.40 | 0.97 | | ++ + +--------------+---------+-----------------+----------------+---------------------------+ +| | | GPTQ-Int8 | 1 | 35.17 | 0.64 | auto_gptq==0.6.0+cu1210 | ++ + +--------------+---------+-----------------+----------------+---------------------------+ +| | | GPTQ-Int4 | 1 | 50.60 | 0.48 | | ++ + +--------------+---------+-----------------+----------------+---------------------------+ +| | | AWQ | 1 | 37.09 | 0.68 | | ++ +--------------+--------------+---------+-----------------+----------------+---------------------------+ +| | 6144 | BF16 | 1 | 47.45 | 1.23 | | ++ + +--------------+---------+-----------------+----------------+---------------------------+ +| | | GPTQ-Int8 | 1 | 36.47 | 0.90 | auto_gptq==0.6.0+cu1210 | ++ + +--------------+---------+-----------------+----------------+---------------------------+ +| | | GPTQ-Int4 | 1 | 48.89 | 0.73 | | ++ + +--------------+---------+-----------------+----------------+---------------------------+ +| | | AWQ | 1 | 37.04 | 0.72 | | ++ +--------------+--------------+---------+-----------------+----------------+---------------------------+ +| | 14336 | BF16 | 1 | 47.11 | 1.60 | | ++ + +--------------+---------+-----------------+----------------+---------------------------+ +| | | GPTQ-Int8 | 1 | 35.44 | 1.26 | auto_gptq==0.6.0+cu1210 | ++ + +--------------+---------+-----------------+----------------+---------------------------+ +| | | GPTQ-Int4 | 1 | 48.26 | 1.10 | | ++ + +--------------+---------+-----------------+----------------+---------------------------+ +| | | AWQ | 1 | 37.14 | 1.10 | | ++ +--------------+--------------+---------+-----------------+----------------+---------------------------+ +| | 30720 | BF16 | 1 | 47.16 | 2.34 | | ++ + +--------------+---------+-----------------+----------------+---------------------------+ +| | | GPTQ-Int8 | 1 | 36.25 | 2.01 | auto_gptq==0.6.0+cu1210 | ++ + +--------------+---------+-----------------+----------------+---------------------------+ +| | | GPTQ-Int4 | 1 | 49.22 | 1.85 | | ++ + +--------------+---------+-----------------+----------------+---------------------------+ +| | | AWQ | 1 | 36.90 | 1.84 | | ++-------------------------+--------------+--------------+---------+-----------------+----------------+---------------------------+ - 0.5B (vLLM) @@ -124,7 +123,7 @@ Notes: - 1.5B (Transformer) +--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+ -| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note | +| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note | +==========================+==============+==============+=========+=================+================+=========================+ | Qwen2.5-1.5B-Instruct | 1 | BF16 | 1 | 39.68 | 2.95 | | + + +--------------+---------+-----------------+----------------+-------------------------+ @@ -203,7 +202,7 @@ Notes: - 3B (Transformer) +--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+ -| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note | +| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note | +==========================+==============+==============+=========+=================+================+=========================+ | Qwen2.5-3B-Instruct | 1 | BF16 | 1 | 30.80 | 5.95 | | + + +--------------+---------+-----------------+----------------+-------------------------+ @@ -282,7 +281,7 @@ Notes: - 7B (Transformer) +-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+ -| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note | +| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note | +=============================+==============+==============+=========+=================+================+=========================+ | Qwen2.5-7B-Instruct | 1 | BF16 | 1 | 40.38 | 14.38 | | + + +--------------+---------+-----------------+----------------+-------------------------+ @@ -321,63 +320,65 @@ Notes: - 7B (vLLM) -+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB)| Note | -+=============================+==============+==============+=========+=================+================+===========================================+ -| Qwen2.5-7B-Instruct | 1 | BF16 | 1 | 84.28 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 122.01 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 154.05 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 148.10 | | | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 6144 | BF16 | 1 | 80.70 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 112.38 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 141.98 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 137.64 | | | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 14336 | BF16 | 1 | 77.69 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 105.25 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 129.35 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 124.91 | | | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 30720 | BF16 | 1 | 70.33 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 90.71 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 108.30 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 104.66 | | | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 63488 | BF16 | 1 | 50.86 | | setting-64k | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 60.52 | | setting-64k | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 67.97 | | setting-64k | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 66.42 | | setting-64k | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 129024 | BF16 | 1 | 28.94 | | vllm==0.6.2, new sample config | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 25.97 | | vllm==0.6.2, new sample config | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 26.37 | | vllm==0.6.2, new sample config | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 26.57 | | vllm==0.6.2, new sample config | -+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ - * [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False) ++-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+ +| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | Note | ++=============================+==============+==============+=========+=================+===========================================+ +| Qwen2.5-7B-Instruct | 1 | BF16 | 1 | 84.28 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 122.01 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 154.05 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 148.10 | | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 6144 | BF16 | 1 | 80.70 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 112.38 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 141.98 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 137.64 | | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 14336 | BF16 | 1 | 77.69 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 105.25 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 129.35 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 124.91 | | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 30720 | BF16 | 1 | 70.33 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 90.71 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 108.30 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 104.66 | | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 63488 | BF16 | 1 | 50.86 | setting-64k | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 60.52 | setting-64k | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 67.97 | setting-64k | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 66.42 | setting-64k | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 129024 | BF16 | 1 | 28.94 | vllm==0.6.2, new sample config | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 25.97 | vllm==0.6.2, new sample config | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 26.37 | vllm==0.6.2, new sample config | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 26.57 | vllm==0.6.2, new sample config | ++-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+ + +* [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False) +* [new sample config]: for vLLM, set the following sampling parameters: SamplingParams(temperature=0.7,top_p=0.8,top_k=20,repetition_penalty=1,presence_penalty=0,frequency_penalty=0,max_tokens=out_length) - 14B (Transformer) +--------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------+ -| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note | +| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note | +==========================+==============+==============+=========+=================+================+=========================+ | Qwen2.5-14B-Instruct | 1 | BF16 | 1 | 24.74 | 28.08 | | + + +--------------+---------+-----------------+----------------+-------------------------+ @@ -415,58 +416,60 @@ Notes: - 14B (vLLM) -+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB)| Note | -+=============================+==============+==============+=========+=================+================+===========================================+ -| Qwen2.5-14B-Instruct | 1 | BF16 | 1 | 46.30 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 70.40 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 98.02 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 92.66 | | | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 6144 | BF16 | 1 | 43.83 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 64.33 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 86.10 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 83.11 | | | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 14336 | BF16 | 1 | 41.91 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 59.21 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 76.85 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 74.03 | | | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 30720 | BF16 | 1 | 37.18 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 49.23 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 60.91 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 59.01 | | | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 63488 | BF16 | 1 | 26.85 | | setting-64k | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 32.83 | | setting-64k | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 37.67 | | setting-64k | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 36.71 | | setting-64k | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 129024 | BF16 | 1 | 14.53 | | vllm==0.6.2, new sample config | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 15.10 | | vllm==0.6.2, new sample config | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 15.13 | | vllm==0.6.2, new sample config | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 15.25 | | vllm==0.6.2, new sample config | -+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ - * [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False) ++-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+ +| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | Note | ++=============================+==============+==============+=========+=================+===========================================+ +| Qwen2.5-14B-Instruct | 1 | BF16 | 1 | 46.30 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 70.40 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 98.02 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 92.66 | | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 6144 | BF16 | 1 | 43.83 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 64.33 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 86.10 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 83.11 | | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 14336 | BF16 | 1 | 41.91 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 59.21 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 76.85 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 74.03 | | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 30720 | BF16 | 1 | 37.18 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 49.23 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 60.91 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 59.01 | | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 63488 | BF16 | 1 | 26.85 | setting-64k | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 32.83 | setting-64k | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 37.67 | setting-64k | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 36.71 | setting-64k | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 129024 | BF16 | 1 | 14.53 | vllm==0.6.2, new sample config | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 15.10 | vllm==0.6.2, new sample config | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 15.13 | vllm==0.6.2, new sample config | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 15.25 | vllm==0.6.2, new sample config | ++-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+ + +* [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False) +* [new sample config]: for vLLM, set the following sampling parameters: SamplingParams(temperature=0.7,top_p=0.8,top_k=20,repetition_penalty=1,presence_penalty=0,frequency_penalty=0,max_tokens=out_length) @@ -514,62 +517,63 @@ Notes: - 32B (vLLM) -+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note | -+=============================+==============+==============+=========+=================+================+===========================================+ -| Qwen2.5-32B-Instruct | 1 | BF16 | 1 | 22.13 | | setting1 | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 37.57 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 55.83 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 51.92 | | | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 6144 | BF16 | 1 | 21.05 | | setting1 | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 34.67 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 49.96 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 46.68 | | | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 14336 | BF16 | 1 | 19.91 | | setting1 | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 31.89 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 44.79 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 41.83 | | | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 30720 | BF16 | 2 | 31.82 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 26.88 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 35.66 | | | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 33.75 | | | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 63488 | BF16 | 2 | 24.45 | | setting-64k | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 18.60 | | setting-64k | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 22.72 | | setting-64k | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 21.79 | | setting-64k | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 129024 | BF16 | 2 | 14.31 | | vllm==0.6.2, new sample config | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 1 | 9.77 | | vllm==0.6.2, new sample config | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 10.39 | | vllm==0.6.2, new sample config | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 1 | 10.34 | | vllm==0.6.2, new sample config | -+-----------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ ++-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+ +| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | Note | ++=============================+==============+==============+=========+=================+===========================================+ +| Qwen2.5-32B-Instruct | 1 | BF16 | 1 | 22.13 | setting1 | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 37.57 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 55.83 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 51.92 | | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 6144 | BF16 | 1 | 21.05 | setting1 | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 34.67 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 49.96 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 46.68 | | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 14336 | BF16 | 1 | 19.91 | setting1 | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 31.89 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 44.79 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 41.83 | | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 30720 | BF16 | 2 | 31.82 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 26.88 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 35.66 | | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 33.75 | | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 63488 | BF16 | 2 | 24.45 | setting-64k | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 18.60 | setting-64k | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 22.72 | setting-64k | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 21.79 | setting-64k | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 129024 | BF16 | 2 | 14.31 | vllm==0.6.2, new sample config | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 1 | 9.77 | vllm==0.6.2, new sample config | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 10.39 | vllm==0.6.2, new sample config | ++ + +--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 1 | 10.34 | vllm==0.6.2, new sample config | ++-----------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+ * For context length 129024, the model needs to be predicted with the following config: "model_max_length"=131072 * [Default Setting]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False) * [Setting 1]=(gpu_memory_utilization=1.0 max_model_len=32768 enforce_eager=True) * [Setting-64k]=(gpu_memory_utilization=0.9 max_model_len=65536 enforce_eager=False) + * [new sample config]: for vLLM, set the following sampling parameters: SamplingParams(temperature=0.7,top_p=0.8,top_k=20,repetition_penalty=1,presence_penalty=0,frequency_penalty=0,max_tokens=out_length) @@ -616,63 +620,63 @@ Notes: - 72B (vLLM) -+------------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | Note | -+==============================+==============+==============+=========+=================+================+===========================================+ -| Qwen2.5-72B-Instruct | 1 | BF16 | 2 | 18.19 | | Setting 1 | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | BF16 | 4 | 31.37 | | Default | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 2 | 31.40 | | Default | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 16.47 | | Default | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 2 | 46.30 | | Setting 2 | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 2 | 44.30 | | Default | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 6144 | BF16 | 4 | 29.90 | | Default | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 2 | 29.37 | | Default | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 1 | 13.88 | | Default | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 2 | 42.50 | | Setting 3 | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 2 | 40.67 | | Default | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 14336 | BF16 | 4 | 30.10 | | Default | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 2 | 27.20 | | Default | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 2 | 38.10 | | Default | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 2 | 36.63 | | Default | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 30720 | BF16 | 4 | 27.53 | | Default | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 2 | 23.32 | | Default | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 2 | 30.98 | | Default | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 2 | 30.02 | | Default | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 63488 | BF16 | 4 | 20.74 | | Setting 4 | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 2 | 16.27 | | Setting 4 | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 2 | 19.84 | | Setting 4 | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 2 | 19.32 | | Setting 4 | -+ +--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ -| | 129024 | BF16 | 4 | 12.68 | | Setting 5 | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int8 | 4 | 14.11 | | Setting 5 | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | GPTQ-Int4 | 2 | 10.11 | | Setting 5 | -+ + +--------------+---------+-----------------+----------------+-------------------------------------------+ -| | | AWQ | 2 | 9.88 | | Setting 5 | -+------------------------------+--------------+--------------+---------+-----------------+----------------+-------------------------------------------+ ++------------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+ +| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | Note | ++==============================+==============+==============+=========+=================+===========================================+ +| Qwen2.5-72B-Instruct | 1 | BF16 | 2 | 18.19 | Setting 1 | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | BF16 | 4 | 31.37 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 2 | 31.40 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 16.47 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 2 | 46.30 | Setting 2 | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 2 | 44.30 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 6144 | BF16 | 4 | 29.90 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 2 | 29.37 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 1 | 13.88 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 2 | 42.50 | Setting 3 | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 2 | 40.67 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 14336 | BF16 | 4 | 30.10 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 2 | 27.20 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 2 | 38.10 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 2 | 36.63 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 30720 | BF16 | 4 | 27.53 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 2 | 23.32 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 2 | 30.98 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 2 | 30.02 | Default | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 63488 | BF16 | 4 | 20.74 | Setting 4 | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 2 | 16.27 | Setting 4 | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 2 | 19.84 | Setting 4 | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 2 | 19.32 | Setting 4 | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | 129024 | BF16 | 4 | 12.68 | Setting 5 | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int8 | 4 | 14.11 | Setting 5 | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | GPTQ-Int4 | 2 | 10.11 | Setting 5 | ++ +--------------+--------------+---------+-----------------+-------------------------------------------+ +| | | AWQ | 2 | 9.88 | Setting 5 | ++------------------------------+--------------+--------------+---------+-----------------+-------------------------------------------+ * [Default Setting]=(gpu_memory_utilization=0.9 max_model_len=32768 enforce_eager=False) * [Setting 1]=(gpu_memory_utilization=0.98 max_model_len=4096 enforce_eager=True)