-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc: Update readme.md #2083
base: master
Are you sure you want to change the base?
Doc: Update readme.md #2083
Conversation
157a396
to
ea1be65
Compare
README.md
Outdated
# Install 2.X API + Framework extension API + PyTorch dependency | ||
pip install neural-compressor[pt] | ||
# Install 2.X API + Framework extension API + TensorFlow dependency | ||
pip install neural-compressor[tf] | ||
``` | ||
> **Note**: | ||
> Further installation methods can be found under [Installation Guide](./docs/source/installation_guide.md). check out our [FAQ](./docs/source/faq.md) for more details. | ||
### Install Neural Compressor from Source for torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this section is a little bit duplicated with "Install from source" section in Installation doc. do we really need this in README doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
Done.
docs/source/3x/PT_FP8Quant.md
Outdated
### FP8 KV cache | ||
Introduction: [kv-cache-quantization in huggingface transformers](https://huggingface.co/blog/kv-cache-quantization) | ||
|
||
BF16 KVCache Code -> [Modeling_all_models.py -> KVCache()](https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/transformers/models/modeling_all_models.py#L40) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not be good to link to certain line of code which is subject to change. may explain some general idea or link to certain doc in OH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Add "--profiling_warmup_steps 5 --profiling_steps 2 --profiling_record_shapes" as args in the end of commandline of run_generation.py. | ||
Refer to [torch.profiler.ProfilerActivity.HPU](https://github.com/huggingface/optimum-habana/blob/c9e1c23620618e2f260c92c46dfeb163545ec5ba/optimum/habana/utils.py#L305). | ||
|
||
### FP8 Accuracy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this acc data is not right and not update-to-date. please check with @XuehaoSun for latest test result. For Qwen2.5 model, there is bug opened, you can wait for that get fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Leave this change later.
``` | ||
|
||
## Examples | ||
### FP8 KV cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enable PatchedVLLMKVCache is not enough. recent PR: HabanaAI/vllm-fork#569 enabled INC to patch ModuleFusedSDPA to FP8. without this PR, there is no perf gain in attention from FP8 KV cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add PatchedModuleFusedSDPA.
I will read it.
docs/source/3x/PT_FP8Quant.md
Outdated
### FP8 Accuracy | ||
"lm_eval.tasks", "lm_eval.evaluator", "lm_eval" are installed from the above requirements_lm_eval.txt. The tasks can be set and the default is ["hellaswag", "lambada_openai", "piqa", "winogrande"], [more info](https://github.com/EleutherAI/lm-evaluation-harness/) | ||
|
||
| `Llama-2-7b-hf`| fp8 & fp8 KVCache| bf16 w/o fp8 KVCache| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bf16 w/o fp8 KVCache? -> bf16 w/ bf16 KVCache
or bf16 w/o bf16 KVCache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vllm_gaudi.png
is pretty good, I notice two typos.
- The model path in command is your local path.
import neural_compressor.torch.quantization
is missing '.', same to vllm_hpu_extension
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
README.md
Outdated
#### Install torch for CPU | ||
```Shell | ||
pip install torch --index-url https://download.pytorch.org/whl/cpu | ||
[Install intel_extension_for_pytorch for CPU](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Install intel_extension_for_pytorch for CPU
- Install intel_extension_for_pytorch for XPU
- Use Docker Image with torch installed for HPU
Note: There is a version mapping between Intel Neural Compressor and Gaudi Software Stack, please refer to this table and make sure to use a matched combination. - Install torch for other platform
- Install tensorflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
docs/source/3x/PT_FP8Quant.md
Outdated
### FP8 Accuracy | ||
"lm_eval.tasks", "lm_eval.evaluator", "lm_eval" are installed from the above requirements_lm_eval.txt. The tasks can be set and the default is ["hellaswag", "lambada_openai", "piqa", "winogrande"], [more info](https://github.com/EleutherAI/lm-evaluation-harness/) | ||
|
||
| `Llama-2-7b-hf`| fp8 & fp8 KVCache| bf16 w/ fp16 KVCache| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `Llama-2-7b-hf`| fp8 & fp8 KVCache| bf16 w/ fp16 KVCache| | |
| `Llama-2-7b-hf`| fp8 & fp8 KVCache| bf16 w/ bf16 KVCache| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
docs/source/3x/PT_FP8Quant.md
Outdated
| piqa | 0.7850924918389554 | 0.7818280739934712 | | ||
| winogrande | 0.6929755327545383 | 0.6929755327545383 | | ||
|
||
| `Qwen2.5-7B-Instruct`| fp8 & fp8 KVCache| bf16 w/ fp16 KVCache| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/source/3x/PT_FP8Quant.md
Outdated
| piqa | 0.5391730141458106 | 0.5391730141458106 | | ||
| winogrande | 0.4956590370955012 | 0.4956590370955012 | | ||
|
||
| `Llama-3.1-8B-Instruct`| fp8 & fp8 KVCache| bf16 w/ fp16 KVCache| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/source/3x/PT_FP8Quant.md
Outdated
| winogrande | 0.7434885556432518 | 0.7371744277821626 | | ||
|
||
|
||
| `Mixtral-8x7B-Instruct-v0.1`| fp8 & fp8 KVCache| bf16 w/ fp16 KVCache|| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: fengding <feng1.ding@intel.com>
for more information, see https://pre-commit.ci
Type of Change
feature or bug fix or documentation or validation or others
API changed or not
Description
detail description
Expected Behavior & Potential Risk
the expected behavior that triggered by this PR
How has this PR been tested?
how to reproduce the test (including hardware information)
Dependency Change?
any library dependency introduced or removed