Add tutorial examples of per-channel quantization #867

OscarSavolainenDR · 2024-02-19T23:00:06Z

Purpose of this PR

To expand QuantAct notebook with scaling per output channel examples, as referenced in issue #862

Changes made in this PR

We add the example of a QuantReLU, and show how per-channel quantization can enable one to better match the dynamic range of any individual channel.

Note: I am not sure my explanation of per_channel_broadcastable_shape and scaling_stats_permute_dims are correct:

`per_channel_broadcastable_shape` represents what the dimensions of the quantization parameters will be, 
and should be  laid out to match those of the output channels of the outputted tensor. We also need to 
specify the permutation dimensions via `scaling_stats_permute_dims` so as to shape the tensor into a
standard format of output channels first. This is so that during the statistics gathering stage of QAT the
correct stats will be gathered.

Giuseppe5 · 2024-02-20T12:52:45Z

The explanation is correct. I would just expand a bit on the inner workings behind scaling_stats_permute_dims, like the mechanism we use to do the permute + view.

OscarSavolainenDR · 2024-02-21T16:20:15Z

@Giuseppe5 I have added some details, let me know if I should add some more! A lot of the permutation code relies on boilerplate stuff, I wasn't sure how detailed of a tracing of the boilerplate I should include in the tutorial. I have all my notes so can get quite granular if we want to.

Giuseppe5 · 2024-02-22T18:00:39Z

notebooks/02_quant_activation_overview.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can see that the per-tensor scale parameter has calibrated itself to provide a full quantization range of 3, matching that of the most extreme channel. \n",


Most extreme -> highest? If I'm understanding the meaning correctly

Yep, I'll make it more clear!

I changed it to indicate the channel with largest dynamic range, instead of most "extreme".

notebooks/02_quant_activation_overview.ipynb

Giuseppe5 · 2024-02-22T18:07:59Z

notebooks/02_quant_activation_overview.ipynb

+   "source": [
+    "Next, we initialise a new `QuantRelU` instance, but this time we specify that we desire per-channel quantization i.e. `scaling_per_output_channel=True`. This will implictly call `scaling_stats_input_view_shape_impl`, defined [here](https://github.com/Xilinx/brevitas/blob/200456825f3b4b8db414f2b25b64311f82d3991a/src/brevitas/quant/solver/common.py#L184), and will change the `QuantReLU` from using a per-tensor view when gathering stats to a per output channel view ([`OverOutputChannelView`](https://github.com/Xilinx/brevitas/blob/200456825f3b4b8db414f2b25b64311f82d3991a/src/brevitas/core/function_wrapper/shape.py#L52)). This simply permutes the tensor into a 2D tensor, with dim 0 equal to the number of output channels.\n",
+    "\n",
+    "To accomplish this, we also need to give it some extra information: `scaling_stats_permute_dims` and `per_channel_broadcastable_shape`. `scaling_stats_permute_dims` is responsible for defining how we do the permutation. `per_channel_broadcastable_shape` simply represents what the dimensions of the quantization parameters will be, i.e. there should be one parameter per output channel.\n",


per_channel_broadcastable_shape is necessary to understand along which dimensions the scale factor has to be broadcasted, so that the scale factor values are applied along the channel dimensions of the input.
By default, PyTorch will broadcast along the first rightmost dimension for which the shapes of the two tensors match. To make sure that we apply the scale factor in our desired dimension, we need to tell PyTorch how to correctly broadcast the scale factors, so the scale factor has to have as many dimensions as the input tensors, with all the shapes equal to 1 apart from the channel dimension.

Awesome, will add more context!

I took your description, as it was a very good explanation haha, and added it to the PR!

Giuseppe5 · 2024-02-23T17:11:25Z

Could you please rebase this over dev? Thanks!

OscarSavolainenDR · 2024-02-24T15:09:21Z

I'm unusually busy until Wednesday, but will rebase and adjust the PR along the lines of the comments then!

… per-channel quantization

OscarSavolainenDR mentioned this pull request Feb 19, 2024

Expand QuantAct notebook with scaling per output channel examples #862

Open

capnramses linked an issue Feb 21, 2024 that may be closed by this pull request

Expand QuantAct notebook with scaling per output channel examples #862

Open

OscarSavolainenDR force-pushed the per-channel-quant-tutorial-PR branch from 80f5cec to 4837b17 Compare February 21, 2024 16:15

Giuseppe5 reviewed Feb 22, 2024

View reviewed changes

notebooks/02_quant_activation_overview.ipynb Outdated Show resolved Hide resolved

Giuseppe5 reviewed Feb 22, 2024

View reviewed changes

OscarSavolainenDR added 4 commits February 28, 2024 17:55

Added some context that the next executed call will throw an error.

6b2a012

Added some cells that introduce and go over per-channel quantization

76e2f3d

Add some details on how the permutation is done behind the scenes for…

814ffed

… per-channel quantization

Made requested changes to descriptions.

44b0bc1

OscarSavolainenDR force-pushed the per-channel-quant-tutorial-PR branch from 9208318 to 44b0bc1 Compare February 28, 2024 22:56

Merge branch 'dev' into per-channel-quant-tutorial-PR

f7a52ba

Giuseppe5 mentioned this pull request Mar 2, 2024

QuantReLU scale factors #859

Closed

Merge branch 'dev' into per-channel-quant-tutorial-PR

8dea5e7

Giuseppe5 self-requested a review March 7, 2024 14:41

Giuseppe5 merged commit 8e996d4 into Xilinx:dev Mar 7, 2024
346 of 347 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tutorial examples of per-channel quantization #867

Add tutorial examples of per-channel quantization #867

OscarSavolainenDR commented Feb 19, 2024 •

edited

Loading

Giuseppe5 commented Feb 20, 2024

OscarSavolainenDR commented Feb 21, 2024

Giuseppe5 Feb 22, 2024

OscarSavolainenDR Feb 24, 2024

OscarSavolainenDR Feb 28, 2024

Giuseppe5 Feb 22, 2024 •

edited

Loading

OscarSavolainenDR Feb 24, 2024

OscarSavolainenDR Feb 28, 2024

Giuseppe5 commented Feb 23, 2024

OscarSavolainenDR commented Feb 24, 2024

Add tutorial examples of per-channel quantization #867

Add tutorial examples of per-channel quantization #867

Conversation

OscarSavolainenDR commented Feb 19, 2024 • edited Loading

Purpose of this PR

Changes made in this PR

Giuseppe5 commented Feb 20, 2024

OscarSavolainenDR commented Feb 21, 2024

Giuseppe5 Feb 22, 2024

Choose a reason for hiding this comment

OscarSavolainenDR Feb 24, 2024

Choose a reason for hiding this comment

OscarSavolainenDR Feb 28, 2024

Choose a reason for hiding this comment

Giuseppe5 Feb 22, 2024 • edited Loading

Choose a reason for hiding this comment

OscarSavolainenDR Feb 24, 2024

Choose a reason for hiding this comment

OscarSavolainenDR Feb 28, 2024

Choose a reason for hiding this comment

Giuseppe5 commented Feb 23, 2024

OscarSavolainenDR commented Feb 24, 2024

OscarSavolainenDR commented Feb 19, 2024 •

edited

Loading

Giuseppe5 Feb 22, 2024 •

edited

Loading