Rebuild docs for speed benchmark (#1045)

* add qwen2.5 perf report * update readme * rebuild docs and fix format issue * remove fuzzy in speed_benchmark.po * fix issue * recover function_call.po * update * remove unused code in speed_benchmark.po
QwenLM · Nov 6, 2024 · a912d23 · a912d23
1 parent 0f0ecfb
commit a912d23
Show file tree

Hide file tree

Showing 3 changed files with 2,534 additions and 1,523 deletions.
diff --git a/README.md b/README.md
@@ -22,7 +22,7 @@ To learn more about Qwen2.5, feel free to read our documentation \[[EN](https://
 - Quantization: the practice of quantizing LLMs with GPTQ, AWQ, as well as the guidance for how to make high-quality quantized GGUF files;
 - Training: the instructions for post-training, including SFT and RLHF (TODO) with frameworks like Axolotl, LLaMA-Factory, etc.
 - Framework: the usage of Qwen with frameworks for application, e.g., RAG, Agent, etc.
-- Benchmark: the statistics about inference speed and memory footprint (to be updated for Qwen2.5).
+- Benchmark: the statistics about inference speed and memory footprint (Available for Qwen2.5).
 
 ## Introduction
 
@@ -37,7 +37,7 @@ In the past three months since Qwen2's release, numerous developers have built n
 
 ## News
 
-- 2024.09.19: We released the Qwen2.5 series. This time there are 3 extra model sizes: 3B, 14B, and 32B for more possibilities. Check our [blog](https://qwenlm.github.io/blog/qwen2.5) for more! 
+- 2024.09.19: We released the Qwen2.5 series. This time there are 3 extra model sizes: 3B, 14B, and 32B for more possibilities. Check our [blog](https://qwenlm.github.io/blog/qwen2.5) for more!
 - 2024.06.06: We released the Qwen2 series. Check our [blog](https://qwenlm.github.io/blog/qwen2/)!
 - 2024.03.28: We released the first MoE model of Qwen: Qwen1.5-MoE-A2.7B! Temporarily, only HF transformers and vLLM support the model. We will soon add the support of llama.cpp, mlx-lm, etc. Check our [blog](https://qwenlm.github.io/blog/qwen-moe/) for more information!
 - 2024.02.05: We released the Qwen1.5 series.
@@ -46,7 +46,7 @@ In the past three months since Qwen2's release, numerous developers have built n
 
 Detailed evaluation results are reported in this <a href="https://qwenlm.github.io/blog/qwen2.5/"> 📑 blog</a>.
 
-For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html) (to be updated for Qwen2.5).
+For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html) .
 
 ## Quickstart