Support layerwise quantization #1018

changwangss · 2024-11-22T07:20:38Z

What does this PR do?

INC support layer-wise feature, both supported cpu and xpu.
because 3.2 plan to release Dec, 9, so I limit the INC installation commit now.
Once INC 3.2 is officially released, I will raise a PR to update this

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-11-22T08:35:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

echarlaix · 2024-11-25T09:41:16Z

optimum/intel/neural_compressor/quantization.py

+    model = load_empty_model(
+        model_id,
+        trust_remote_code=trust_remote_code,
+    )


The model was already loaded above so would make sense to remove this part as well, alos we need to set cls=load_empty_model when loading the model with load_empty_model https://github.com/intel/neural-compressor/blob/v3.1.1/neural_compressor/torch/utils/utility.py#L354

Suggested change

model = load_empty_model(

model_id,

trust_remote_code=trust_remote_code,

)

model = load_empty_model(model_id, cls=model_class, **loading_kwargs)

I improve the code, load_empty_model only needed by "layer-wise" feature.
I didn't pass loading_kwargs, because load_empty_model function doesn't support loading_kwargs, the error raised.

> model = cls(config, **kwargs) E TypeError: __init__() got an unexpected keyword argument 'subfolder'

echarlaix · 2024-11-25T09:45:16Z

tests/neural_compressor/test_optimization.py

            )
        else:
-            quantization_config = RtnConfig(bits=bits, group_size=8)
+            quantization_config = RtnConfig(bits=bits, group_size=8, use_layer_wise=True)


Why do we need to specify it when creating the quantization config ? Looks like with the current integration this information will be ignored (load_empty_model will be called in all cases)

I improve it, add a case to test layer-wise.

Signed-off-by: changwangss <chang1.wang@intel.com>

Signed-off-by: changwa1 <chang1.wang@intel.com>

changwangss · 2024-11-27T06:52:47Z

Hi @echarlaix,

Due to some layer-wise bug fixes in INC 3.2, and with the release planned for 12.9, I’ve set the installation to point to a specific commit for now. Once INC 3.2 is officially released, I will raise a PR to update this. Let me know your thoughts!

changwangss · 2024-11-27T07:12:03Z

CI IPEX issue fixed by #1009 , detail has discussed with @IlyasMoutawwakil in #1027 .

optimum/intel/neural_compressor/quantization.py

echarlaix · 2024-11-29T09:22:56Z

optimum/intel/neural_compressor/quantization.py

+        if hasattr(quantization_config, "use_layer_wise") and quantization_config.use_layer_wise:
+            from neural_compressor.torch import load_empty_model
+
+            model = load_empty_model(model_id, cls=model_class, trust_remote_code=trust_remote_code)


why not :

Suggested change

model = load_empty_model(model_id, cls=model_class, trust_remote_code=trust_remote_code)

model = load_empty_model(model_id, cls=model_class, **loading_kwargs)

Let me make explaination, because we initialize the config first, and then initialize the model in load_empty_model function. https://github.com/intel/neural-compressor/blob/e2696603f45f5796f1c048aab33eef11aaeb2cdb/neural_compressor/torch/utils/utility.py#L356

when the **loading_kwargs passed, config initialization will raise error like the following.

> model = cls(config, **kwargs) E TypeError: __init__() got an unexpected keyword argument 'subfolder'

echarlaix · 2024-11-29T09:26:26Z

optimum/intel/neural_compressor/quantization.py

+        if hasattr(quantization_config, "use_layer_wise") and quantization_config.use_layer_wise:
+            from neural_compressor.torch import load_empty_model
+
+            model = load_empty_model(model_id, cls=model_class, trust_remote_code=trust_remote_code)


looks the same for both cpu / xpu, shouldn't it be moved to the correct device ? also should be moved outside of the if use_xpu condition as the code is duplicated

agree, I improved it.

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

Signed-off-by: sys-lpot-val <sys_lpot_val@intel.com>

Signed-off-by: changwangss <chang1.wang@intel.com>

echarlaix reviewed Nov 25, 2024

View reviewed changes

changwangss added 2 commits November 26, 2024 14:01

support layerwise quantization

a842ded

Signed-off-by: changwangss <chang1.wang@intel.com>

fix ut and example

5f14658

Signed-off-by: changwa1 <chang1.wang@intel.com>

changwangss force-pushed the wangchang/layerwise branch from 572e37c to 5f14658 Compare November 26, 2024 06:02

changwangss added 4 commits November 26, 2024 15:02

improve model init

9e0bb7c

Signed-off-by: changwa1 <chang1.wang@intel.com>

improve ut

28aac24

Signed-off-by: changwa1 <chang1.wang@intel.com>

fix loading kwargs issue

18855fd

Signed-off-by: changwa1 <chang1.wang@intel.com>

set neuralcompressor commit

0991631

Signed-off-by: changwa1 <chang1.wang@intel.com>

changwangss requested a review from echarlaix November 27, 2024 08:47

echarlaix reviewed Nov 29, 2024

View reviewed changes

changwangss and others added 4 commits November 29, 2024 17:27

Update optimum/intel/neural_compressor/quantization.py

3e21d57

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

fix lay-wise model init

62c91fa

Signed-off-by: sys-lpot-val <sys_lpot_val@intel.com>

Merge branch 'main' into wangchang/layerwise

7e73e82

fix quantization_config init

0b36916

Signed-off-by: changwangss <chang1.wang@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support layerwise quantization #1018

Support layerwise quantization #1018

changwangss commented Nov 22, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 22, 2024

echarlaix Nov 25, 2024

changwangss Nov 26, 2024 •

edited

Loading

echarlaix Nov 25, 2024

changwangss Nov 26, 2024 •

edited

Loading

changwangss commented Nov 27, 2024

changwangss commented Nov 27, 2024

echarlaix Nov 29, 2024

changwangss Nov 29, 2024 •

edited

Loading

echarlaix Nov 29, 2024

changwangss Nov 29, 2024

	model = load_empty_model(model_id, cls=model_class, trust_remote_code=trust_remote_code)
	model = load_empty_model(model_id, cls=model_class, **loading_kwargs)

Support layerwise quantization #1018

Are you sure you want to change the base?

Support layerwise quantization #1018

Conversation

changwangss commented Nov 22, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Nov 22, 2024

echarlaix Nov 25, 2024

Choose a reason for hiding this comment

changwangss Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

echarlaix Nov 25, 2024

Choose a reason for hiding this comment

changwangss Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

changwangss commented Nov 27, 2024

changwangss commented Nov 27, 2024

echarlaix Nov 29, 2024

Choose a reason for hiding this comment

changwangss Nov 29, 2024 • edited Loading

Choose a reason for hiding this comment

echarlaix Nov 29, 2024

Choose a reason for hiding this comment

changwangss Nov 29, 2024

Choose a reason for hiding this comment

changwangss commented Nov 22, 2024 •

edited

Loading

changwangss Nov 26, 2024 •

edited

Loading

changwangss Nov 26, 2024 •

edited

Loading

changwangss Nov 29, 2024 •

edited

Loading