Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc: Update readme.md #2083

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Doc: Update readme.md #2083

wants to merge 2 commits into from

Conversation

feng-intel
Copy link
Contributor

Type of Change

feature or bug fix or documentation or validation or others
API changed or not

Description

detail description

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

@feng-intel feng-intel force-pushed the master branch 4 times, most recently from 157a396 to ea1be65 Compare December 10, 2024 09:39
README.md Outdated
# Install 2.X API + Framework extension API + PyTorch dependency
pip install neural-compressor[pt]
# Install 2.X API + Framework extension API + TensorFlow dependency
pip install neural-compressor[tf]
```
> **Note**:
> Further installation methods can be found under [Installation Guide](./docs/source/installation_guide.md). check out our [FAQ](./docs/source/faq.md) for more details.
### Install Neural Compressor from Source for torch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section is a little bit duplicated with "Install from source" section in Installation doc. do we really need this in README doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
Done.

### FP8 KV cache
Introduction: [kv-cache-quantization in huggingface transformers](https://huggingface.co/blog/kv-cache-quantization)

BF16 KVCache Code -> [Modeling_all_models.py -> KVCache()](https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/transformers/models/modeling_all_models.py#L40)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not be good to link to certain line of code which is subject to change. may explain some general idea or link to certain doc in OH

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Add "--profiling_warmup_steps 5 --profiling_steps 2 --profiling_record_shapes" as args in the end of commandline of run_generation.py.
Refer to [torch.profiler.ProfilerActivity.HPU](https://github.com/huggingface/optimum-habana/blob/c9e1c23620618e2f260c92c46dfeb163545ec5ba/optimum/habana/utils.py#L305).

### FP8 Accuracy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this acc data is not right and not update-to-date. please check with @XuehaoSun for latest test result. For Qwen2.5 model, there is bug opened, you can wait for that get fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Leave this change later.

```

## Examples
### FP8 KV cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable PatchedVLLMKVCache is not enough. recent PR: HabanaAI/vllm-fork#569 enabled INC to patch ModuleFusedSDPA to FP8. without this PR, there is no perf gain in attention from FP8 KV cache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add PatchedModuleFusedSDPA.
I will read it.

docs/source/3x/PT_FP8Quant.md Show resolved Hide resolved
### FP8 Accuracy
"lm_eval.tasks", "lm_eval.evaluator", "lm_eval" are installed from the above requirements_lm_eval.txt. The tasks can be set and the default is ["hellaswag", "lambada_openai", "piqa", "winogrande"], [more info](https://github.com/EleutherAI/lm-evaluation-harness/)

| `Llama-2-7b-hf`| fp8 & fp8 KVCache| bf16 w/o fp8 KVCache|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bf16 w/o fp8 KVCache? -> bf16 w/ bf16 KVCache or bf16 w/o bf16 KVCache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

Copy link
Contributor

@xin3he xin3he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm_gaudi.png is pretty good, I notice two typos.

  • The model path in command is your local path.
  • import neural_compressor.torch.quantization is missing '.', same to vllm_hpu_extension

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@feng-intel
Copy link
Contributor Author

feng-intel commented Dec 19, 2024

vllm_gaudi.png is pretty good, I notice two typos.

  • The model path in command is your local path.
  • import neural_compressor.torch.quantization is missing '.', same to vllm_hpu_extension
  1. Done
  2. Done. "import ..." is just the fake code.

Copy link
Contributor

@xin3he xin3he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

README.md Outdated
#### Install torch for CPU
```Shell
pip install torch --index-url https://download.pytorch.org/whl/cpu
[Install intel_extension_for_pytorch for CPU](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/)
Copy link
Contributor

@chensuyue chensuyue Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to split those into several lines, make it clear.
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

### FP8 Accuracy
"lm_eval.tasks", "lm_eval.evaluator", "lm_eval" are installed from the above requirements_lm_eval.txt. The tasks can be set and the default is ["hellaswag", "lambada_openai", "piqa", "winogrande"], [more info](https://github.com/EleutherAI/lm-evaluation-harness/)

| `Llama-2-7b-hf`| fp8 & fp8 KVCache| bf16 w/ fp16 KVCache|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `Llama-2-7b-hf`| fp8 & fp8 KVCache| bf16 w/ fp16 KVCache|
| `Llama-2-7b-hf`| fp8 & fp8 KVCache| bf16 w/ bf16 KVCache|

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

| piqa | 0.7850924918389554 | 0.7818280739934712 |
| winogrande | 0.6929755327545383 | 0.6929755327545383 |

| `Qwen2.5-7B-Instruct`| fp8 & fp8 KVCache| bf16 w/ fp16 KVCache|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

| piqa | 0.5391730141458106 | 0.5391730141458106 |
| winogrande | 0.4956590370955012 | 0.4956590370955012 |

| `Llama-3.1-8B-Instruct`| fp8 & fp8 KVCache| bf16 w/ fp16 KVCache|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

| winogrande | 0.7434885556432518 | 0.7371744277821626 |


| `Mixtral-8x7B-Instruct-v0.1`| fp8 & fp8 KVCache| bf16 w/ fp16 KVCache||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: fengding <feng1.ding@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants