Feat: Support for Groupwise (MX) quantization #971

Giuseppe5 · 2024-06-17T10:12:49Z

This implements:

New GroupwiseQuantTensor for Int and Float
Relevant Proxy classes
MX Float based quantizers
One notebook to test instantiation and execution
Default MXInt quantizers

Missing:

Export

Features

Dynamic expansion of contracted groupwise tensors

Compared to Int/Float QuantTensor, the main difference of their groupwise equivalent is that value, scale, and zero_point are not direct attributes anymore but properties. The new attributes are value_, scale_, and zero_point_.

The reason for this is shaping. When quantizing a tensor with shapes [O, I], where O is output channel and I is input channel, with groupsize k, groupwise quantization is normally represented as follow:

Tensor with shapes [O, k, I/k]
Scales with shapes [O, k, 1]
Zero point same as scale

The alternative to this representation is to have all three tensors with shapes [O,I], with a massive increase in memory utilization, especially with QAT + gradients.

The underscored attributes will have the compressed shapes, while the properties (non-underscored naming) will dynamically compute the expanded version of the property. This means:

quant_tensor.scale_.shape
# This will print [O, k, 1]
quant_tensor.scale.shape
# This will print [O, I]

Internally, the quantization will happen with groupwise shaping (e.g., contracted). For this reason, there is a preliminary view applied to the tensors before everything goes in tensor_quant.

Deprecation of scaling_per_output_channel

Another important change of this PR is the deprecation (i.e., still usable but not recommended anymore) of the flag scaling_per_output_channel, in favor of a ternary flag scaling_per_output that can be TENSOR/CHANNEL/GROUP.

Lots of work has gone into maintaining retro-compatibility with the existing binary flag, so that everything will still work as intended.
One thing that is still up for discussion is how to handle shared quantizers for groupwise quantization.

Automatic OCP definition

When instantiating a OCP FP quantizer, NaN/INF encoding are automatically defined through dep inj based on the bitwidths. This is to avoid to have manually to define all possible minifloat quantizers needed for MX (e.g. MX FP8, FP6, etc.)

Groupwise Default quantizers

Lots of possible quantizers could be defined by default. For integer (not much of a problem) we have groupwise with float scaling (defined in examples if I'm not mistaken) and MX (groupwise with Po2 scale) defined in the brevitas source.

For float it gets a bit more complicated with inf/nan. The solution adopted is the following:

Float scale will use non-standard floating point representation, meaning no inf/nan representations for all the bitwidths. Defined in examples
Po2 scale will use OCP standard for FP8/FP6/FP4, so that we have MX Float compliant quantizers defined out of the box. Defined in brevitas source

I only created quantizers with bitwidth equal to 8, e4m3. Overriding the bit_width, mantissa_bit_width and exponent_bit_width will produce OCP compliant MX quantizers (thanks to the change in Automatic OCP definition)

Example changes

No longer separated flags for ocp/fnuz standard. Now the user should pass float_e4m3 for general fp8 no standard, float_ocp_e4m3 for OCP, float_fnuz_e4m3 for FNUZ.

src/brevitas/quant_tensor/groupwise_float_quant_tensor.py

nickfraser

Maybe worth double-checking all of the <Datatype>QuantTensor instantiations where the order of the arguments really matter. I found a few errors that are likely due to some original issue + copy/paste, so they may also exist outside your specific changes.

I think it's worth checking double-checking as many as you can while you're fixing the ones I found.

src/brevitas/proxy/groupwise_int_parameter_quant.py

src/brevitas/proxy/groupwise_int_runtime_quant.py

src/brevitas/proxy/groupwise_float_runtime_quant.py

src/brevitas/proxy/float_runtime_quant.py

src/brevitas/quant_tensor/groupwise_int_quant_tensor.py

src/brevitas_examples/common/generative/quant_blocks.py

src/brevitas/graph/quantize.py

notebooks/minifloat_mx_tutorial.ipynb

nickfraser

LGTM!

Giuseppe5 force-pushed the groupwise_scaling branch from 6c7ce63 to ee3fbdd Compare June 19, 2024 15:01

Giuseppe5 requested a review from nickfraser June 20, 2024 09:29

Giuseppe5 force-pushed the groupwise_scaling branch from 3e8e2ca to 69e5e98 Compare July 30, 2024 12:29

Giuseppe5 requested review from nickfraser and removed request for nickfraser August 1, 2024 18:13

Giuseppe5 commented Aug 1, 2024

View reviewed changes

src/brevitas/quant_tensor/groupwise_float_quant_tensor.py Outdated Show resolved Hide resolved

Giuseppe5 self-assigned this Aug 14, 2024

nickfraser added the next release PRs which should be merged for the next release label Aug 14, 2024

nickfraser requested changes Aug 15, 2024

View reviewed changes

nickfraser mentioned this pull request Aug 15, 2024

Feat: Update LLM entry-point #987

Merged

20 tasks

Giuseppe5 mentioned this pull request Aug 20, 2024

Feat (llm): export to MatMulNBits #733

Open

Giuseppe5 added 19 commits August 20, 2024 14:39

Feat: Support for Groupwise (MX) quantization

51e8b12

Fix

6afe824

default groupsize

21a1ce2

Quantizers

e4be937

update

b1f9a4b

Fix for keepdim

b2a1f14

Missing file

d2f48cf

Update

5d207b7

notebook

ac07eee

Fix tests

cb7adaa

more quantizers

d0e2977

New enum for groupwise scaling

eb6be38

fix

82283d0

fix

26fd171

fix notebooks

633a1f7

Fix order

b539e3f

update notebook

cd571de

Fix

b76a54e

Better retrocompatibility

a7f772c

Giuseppe5 added 15 commits August 20, 2024 14:39

Better condition

1d3c152

Solving order

9dccd47

Fix dep inj

d3412fa

Fix for MSE

8573ce2

fix concat dim

2a777e5

fix max_ave

ad24bc6

last fix

f7d2736

Rename

d4be537

GroupwiseInt + removed comments

2626de1

Missing file

d6c1b95

Fix for brevitas_examples

fbc9e55

Updated examples

ad9eae3

Updated groupwise int quant tensor and notebook

bdbfd17

Review, notebook missing

145f6da

Updated notebooks

6a62d3d

nickfraser reviewed Aug 20, 2024

View reviewed changes

notebooks/minifloat_mx_tutorial.ipynb Outdated Show resolved Hide resolved

Typo + ocp/fnuz quantizers

ab8bf37

nickfraser approved these changes Aug 20, 2024

View reviewed changes

Updated README

8ed14c6

Giuseppe5 force-pushed the groupwise_scaling branch from 8f26c84 to 8ed14c6 Compare August 20, 2024 13:47

Update minifloat_mx_tutorial.ipynb

5aeba52

Giuseppe5 merged commit f1655b2 into Xilinx:dev Aug 20, 2024
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Support for Groupwise (MX) quantization #971

Feat: Support for Groupwise (MX) quantization #971

Giuseppe5 commented Jun 17, 2024 •

edited

Loading

nickfraser left a comment

nickfraser left a comment

Feat: Support for Groupwise (MX) quantization #971

Feat: Support for Groupwise (MX) quantization #971

Conversation

Giuseppe5 commented Jun 17, 2024 • edited Loading

Features

Dynamic expansion of contracted groupwise tensors

Deprecation of scaling_per_output_channel

Automatic OCP definition

Groupwise Default quantizers

Example changes

nickfraser left a comment

Choose a reason for hiding this comment

nickfraser left a comment

Choose a reason for hiding this comment

Giuseppe5 commented Jun 17, 2024 •

edited

Loading