Added some cells that introduce and go over per-channel quantization

Xilinx · Feb 19, 2024 · ab3c5c1 · ab3c5c1
1 parent c537e94
commit ab3c5c1
Showing 1 changed file with 81 additions and 0 deletions.
diff --git a/notebooks/02_quant_activation_overview.ipynb b/notebooks/02_quant_activation_overview.ipynb
@@ -685,6 +685,87 @@
     "assert out1_train.scale.isclose(out2_eval.scale).item()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In all of the examples that have currently been looked at in this tutorial, we have used per-tensor quantization. I.e., the output tensor of the activation, if quantized, was always quantized on a per-tensor level, with a single scale and zero-point quantization parameter per output tensor. However, one can also do per-channel quantization, where each output channel of the tensor has its own quantization parameters. In the example below, we look at per-tensor quantization of an input tensor that has 3 channels and 256 elements in the height and width dimensions. We purposely mutate the 1st channel to have its dynamic range be 3 times larger than the other 2 channels. We then feed it through a `QuantReLU`, whose default behavior is to quantize at a per-tensor granularity."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 161,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor(2.9998, grad_fn=<MulBackward0>)"
+      ]
+     },
+     "execution_count": 161,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "out_channels = 3\n",
+    "inp3 = torch.rand(1, out_channels, 256, 256)  # (B, C, H, W)\n",
+    "inp3[:, 0, :, :] *= 3\n",
+    "\n",
+    "per_tensor_quant_relu = QuantReLU(return_quant_tensor=True)\n",
+    "out_tensor = per_tensor_quant_relu(inp3)\n",
+    "out_tensor.scale * ((2**8) -1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can see that the per-tensor scale parameter has calibrated itself to provide a full quantization range of 3, matching that of the most extreme channel.\n",
+    "\n",
+    "Next, we initialise a new `QuantRelU` instance, but this time we specify that we desire per-channel quantization i.e. `scaling_per_output_channel=True`. To accomplish this, we also need to give it some extra information on the dimensions of the inputted tensor, so that it knows which dimensions to interpret as the output channels. This is done via the `per_channel_broadcastable_shape` and `scaling_stats_permute_dims` attributes. \n",
+    "\n",
+    "`per_channel_broadcastable_shape` represents what the dimensions of the quantization parameters will be, and should be laid out to match those of the output channels of the outputted tensor. We also need to specify the permutation dimensions via `scaling_stats_permute_dims` so as to shape the tensor into a standard format of output channels first. This is so that during the statistics gathering stage of QAT the correct stats will be gathered."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 160,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[[[2.9999]],\n",
+       "\n",
+       "         [[1.0000]],\n",
+       "\n",
+       "         [[1.0000]]]], grad_fn=<MulBackward0>)"
+      ]
+     },
+     "execution_count": 160,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "per_chan_quant_relu = QuantReLU(return_quant_tensor=True,\n",
+    "                                scaling_per_output_channel=True,\n",
+    "                                per_channel_broadcastable_shape=(1, out_channels, 1 , 1),\n",
+    "                                scaling_stats_permute_dims=(1, 0, 2, 3),\n",
+    "                                )\n",
+    "out_channel = per_chan_quant_relu(inp3)\n",
+    "out_channel.scale * ((2**8) -1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Above, we can see that the number of elements in the quantization scale of the outputted tensor is now 3, matching those of the 3-channel tensor! Furthermore, we see that each channel has an 8-bit quantization range that matches its data distribution, which is much more ideal in terms of reducing quantization mismatch. However, it's important to note that some hardware providers don't efficiently support per-channel quantization in production, so it's best to check if your targetted hardware will allow per-channel quantization."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},