Doc: Update readme.md #2083

feng-intel · 2024-12-04T07:15:17Z

Type of Change

feature or bug fix or documentation or validation or others
API changed or not

Description

detail description

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

thuang6 · 2024-12-16T13:58:14Z

README.md

 # Install 2.X API + Framework extension API + PyTorch dependency
 pip install neural-compressor[pt]
 # Install 2.X API + Framework extension API + TensorFlow dependency
 pip install neural-compressor[tf]
 ```
-> **Note**:
-> Further installation methods can be found under [Installation Guide](./docs/source/installation_guide.md). check out our [FAQ](./docs/source/faq.md) for more details.
+### Install Neural Compressor from Source for torch


this section is a little bit duplicated with "Install from source" section in Installation doc. do we really need this in README doc?

thuang6 · 2024-12-16T14:40:21Z

docs/source/3x/PT_FP8Quant.md

+### FP8 KV cache
+Introduction: [kv-cache-quantization in huggingface transformers](https://huggingface.co/blog/kv-cache-quantization)    
+
+BF16 KVCache Code -> [Modeling_all_models.py -> KVCache()](https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/transformers/models/modeling_all_models.py#L40)    


not be good to link to certain line of code which is subject to change. may explain some general idea or link to certain doc in OH

thuang6 · 2024-12-16T14:43:11Z

docs/source/3x/PT_FP8Quant.md

+Add "--profiling_warmup_steps 5 --profiling_steps 2 --profiling_record_shapes" as args in the end of commandline of run_generation.py.     
+Refer to [torch.profiler.ProfilerActivity.HPU](https://github.com/huggingface/optimum-habana/blob/c9e1c23620618e2f260c92c46dfeb163545ec5ba/optimum/habana/utils.py#L305).    
+
+### FP8 Accuracy 


this acc data is not right and not update-to-date. please check with @XuehaoSun for latest test result. For Qwen2.5 model, there is bug opened, you can wait for that get fixed.

OK. Leave this change later.

thuang6 · 2024-12-16T14:53:00Z

docs/source/3x/PT_FP8Quant.md

 ```

-## Examples
+### FP8 KV cache


enable PatchedVLLMKVCache is not enough. recent PR: HabanaAI/vllm-fork#569 enabled INC to patch ModuleFusedSDPA to FP8. without this PR, there is no perf gain in attention from FP8 KV cache

Add PatchedModuleFusedSDPA.
I will read it.

docs/source/3x/PT_FP8Quant.md

xin3he · 2024-12-17T05:08:41Z

docs/source/3x/PT_FP8Quant.md

+### FP8 Accuracy 
+"lm_eval.tasks", "lm_eval.evaluator", "lm_eval" are installed from the above requirements_lm_eval.txt. The tasks can be set and the default is ["hellaswag", "lambada_openai", "piqa", "winogrande"], [more info](https://github.com/EleutherAI/lm-evaluation-harness/)    
+
+| `Llama-2-7b-hf`| fp8 & fp8 KVCache| bf16 w/o fp8 KVCache|


bf16 w/o fp8 KVCache? -> bf16 w/ bf16 KVCache or bf16 w/o bf16 KVCache

Done. Thanks.

xin3he

vllm_gaudi.png is pretty good, I notice two typos.

The model path in command is your local path.
import neural_compressor.torch.quantization is missing '.', same to vllm_hpu_extension

README.md

feng-intel · 2024-12-19T15:42:06Z

vllm_gaudi.png is pretty good, I notice two typos.

The model path in command is your local path.

import neural_compressor.torch.quantization is missing '.', same to vllm_hpu_extension

Done
Done. "import ..." is just the fake code.

xin3he

LGTM

chensuyue · 2024-12-24T07:21:47Z

README.md

-#### Install torch for CPU
-```Shell
-pip install torch --index-url https://download.pytorch.org/whl/cpu
+[Install intel_extension_for_pytorch for CPU](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/)


Need to split those into several lines, make it clear.

Install intel_extension_for_pytorch for CPU

Install intel_extension_for_pytorch for XPU

Use Docker Image with torch installed for HPU
Note: There is a version mapping between Intel Neural Compressor and Gaudi Software Stack, please refer to this table and make sure to use a matched combination.

Install torch for other platform

Install tensorflow

Done. Thanks.

xin3he · 2024-12-24T13:22:24Z

docs/source/3x/PT_FP8Quant.md

+### FP8 Accuracy 
+"lm_eval.tasks", "lm_eval.evaluator", "lm_eval" are installed from the above requirements_lm_eval.txt. The tasks can be set and the default is ["hellaswag", "lambada_openai", "piqa", "winogrande"], [more info](https://github.com/EleutherAI/lm-evaluation-harness/)    
+
+| `Llama-2-7b-hf`| fp8 & fp8 KVCache| bf16 w/ fp16 KVCache|


Done. Thanks.

xin3he · 2024-12-24T13:22:33Z

docs/source/3x/PT_FP8Quant.md

+| piqa          | 0.7850924918389554   | 0.7818280739934712 |
+| winogrande    | 0.6929755327545383   | 0.6929755327545383 |
+
+| `Qwen2.5-7B-Instruct`| fp8 & fp8 KVCache| bf16 w/ fp16 KVCache|


xin3he · 2024-12-24T13:22:45Z

docs/source/3x/PT_FP8Quant.md

+| piqa          | 0.5391730141458106   | 0.5391730141458106 |
+| winogrande    | 0.4956590370955012  | 0.4956590370955012 |
+
+| `Llama-3.1-8B-Instruct`| fp8 & fp8 KVCache| bf16 w/ fp16 KVCache|


xin3he · 2024-12-24T13:22:51Z

docs/source/3x/PT_FP8Quant.md

+| winogrande    | 0.7434885556432518  | 0.7371744277821626 |
+
+
+| `Mixtral-8x7B-Instruct-v0.1`| fp8 & fp8 KVCache| bf16 w/ fp16 KVCache||


Signed-off-by: fengding <feng1.ding@intel.com>

for more information, see https://pre-commit.ci

feng-intel force-pushed the master branch 4 times, most recently from 157a396 to ea1be65 Compare December 10, 2024 09:39

thuang6 reviewed Dec 16, 2024

View reviewed changes

thuang6 requested review from changwangss, chensuyue and xin3he December 16, 2024 14:53

xin3he reviewed Dec 17, 2024

View reviewed changes