Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Remove QOP Export #917

Merged
merged 6 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
398 changes: 0 additions & 398 deletions docs/tutorials/onnx_export.ipynb

Large diffs are not rendered by default.

175 changes: 1 addition & 174 deletions docs/tutorials/tvmcon2021.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1882,93 +1882,6 @@
" return IFrame(src=f\"http://localhost:{port}/\", width=\"100%\", height=400)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Export to ONNX QOps\n",
"\n",
"Say we want to export a QuantConv1d with 4b symmetric weights, 8b symmetric inputs and outputs, and 16 biases. \n",
"We can export it to a ONNX's `QLinearConv`, but some information will be lost. In particular, weights will be represented as 8b and bias as 32b, even though they are respectively 4b and 16b. This is because ONNX does not provide a standardized way to represent them as such:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"torch.manual_seed(0)\n",
"\n",
"from brevitas.nn import QuantConv1d\n",
"from brevitas.quant import Int8WeightPerTensorFloat, Int8ActPerTensorFloat, Int16Bias\n",
"from brevitas.export import export_onnx_qop\n",
"\n",
"float_inp = torch.randn(1, 2, 5)\n",
"\n",
"quant_conv_4b8b = QuantConv1d(\n",
" 2, 4, 3, bias=True, weight_bit_width=4,\n",
" input_quant=Int8ActPerTensorFloat,\n",
" output_quant=Int8ActPerTensorFloat,\n",
" bias_quant=Int16Bias)\n",
"\n",
"output_path = 'qop_onnx_conv_4b8b.onnx'\n",
"export_onnx_qop(quant_conv_4b8b, input_t=float_inp, export_path=output_path)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"tags": [
"skip-execution"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Serving 'qop_onnx_conv_4b8b.onnx' at http://localhost:8082\n"
]
},
{
"data": {
"text/html": [
"\n",
" <iframe\n",
" width=\"100%\"\n",
" height=\"400\"\n",
" src=\"http://localhost:8082/\"\n",
" frameborder=\"0\"\n",
" allowfullscreen\n",
" \n",
" ></iframe>\n",
" "
],
"text/plain": [
"<IPython.lib.display.IFrame at 0x1720d689b38>"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"show_netron(output_path, 8082)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In general the standard ONNX opset doesn't support representing quantization below 8b. Additionally, ONNX QOp representation requires an output quantizer to be set at part of of the layer. \n",
"\n",
"The constraint of always having an output quantizer is relaxed in the more recently introduced QDQ style of representation (for which there is support in Brevitas starting from version 0.8), which uses only `QuantizeLinear` and `DequantizeLinear` to represent quantization, but even with that support is still limited to 8b quantization."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -2112,93 +2025,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The custom format shown above can integrated into ONNX-based toolchains, e.g. it's supported by our own FINN toolchain for low-precision dataflow style custom FPGAs implementations, and would be a starting point for direct integration with TVM.\n",
"\n",
"## Export to TorchScript quantization backend\n",
"\n",
"It's also possible to export to TorchScript own quantized functional operators, which come with their own set of restrictions. In particular, weights should be 7b and unsigned, which requires a zero-point. We can model that with appropriate quantizers:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from brevitas.quant import ShiftedUint8ActPerTensorFloat\n",
"from brevitas.export import export_torch_qop\n",
"\n",
"\n",
"quant_conv_8b7b = QuantConv1d(\n",
" 2, 4, 3, bias=True,\n",
" input_quant=ShiftedUint8ActPerTensorFloat,\n",
" output_quant=ShiftedUint8ActPerTensorFloat,\n",
" weight_bit_width=7,\n",
" bias_quant=Int16Bias)\n",
"\n",
"output_path = 'pytorch_qf_conv_8b7b.pt'\n",
"export_torch_qop(quant_conv_8b7b, input_t=float_inp, export_path=output_path)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"tags": [
"skip-execution"
]
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\users\\alessandro\\documenti\\brevitas_tvmcon\\src\\brevitas\\quant_tensor\\__init__.py:74: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.\n",
" training = torch.tensor(training, dtype=torch.bool)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Serving 'pytorch_qf_conv_8b7b.pt' at http://localhost:8085\n"
]
},
{
"data": {
"text/html": [
"\n",
" <iframe\n",
" width=\"100%\"\n",
" height=\"400\"\n",
" src=\"http://localhost:8085/\"\n",
" frameborder=\"0\"\n",
" allowfullscreen\n",
" \n",
" ></iframe>\n",
" "
],
"text/plain": [
"<IPython.lib.display.IFrame at 0x1720e87a438>"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"show_netron(output_path, 8085)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see though information on the fact that activations are 7b is lost, and they simply marked as 8b.\n",
"\n",
"Additionally, because bias quantization is not represented explicitly (although it is performed implicitly at 32b at runtime in the backend), any information around that is lost.\n",
"As with standard ONNX, representing precisions below 8b is not possible."
"The custom format shown above can integrated into ONNX-based toolchains, e.g. it's supported by our own FINN toolchain for low-precision dataflow style custom FPGAs implementations, and would be a starting point for direct integration with TVM."
]
},
{
Expand Down
176 changes: 1 addition & 175 deletions notebooks/Brevitas_TVMCon2021.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1903,102 +1903,6 @@
" return IFrame(src=f\"http://localhost:{port}/\", width=\"100%\", height=400)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Export to ONNX QOps\n",
"\n",
"Say we want to export a QuantConv1d with 4b symmetric weights, 8b symmetric inputs and outputs, and 16 biases. \n",
"We can export it to a ONNX's `QLinearConv`, but some information will be lost. In particular, weights will be represented as 8b and bias as 32b, even though they are respectively 4b and 16b. This is because ONNX does not provide a standardized way to represent them as such:"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/scratch/fabian/brevitas/src/brevitas/export/onnx/standard/manager.py:26: UserWarning: ONNX opset version set to 13, override with opset_version=\n",
" warnings.warn(f\"ONNX opset version set to {DEFAULT_OPSET}, override with {ka}=\")\n"
]
}
],
"source": [
"torch.manual_seed(0)\n",
"\n",
"from brevitas.nn import QuantConv1d\n",
"from brevitas.quant import Int8WeightPerTensorFloat, Int8ActPerTensorFloat, Int16Bias\n",
"from brevitas.export import export_onnx_qop\n",
"\n",
"float_inp = torch.randn(1, 2, 5)\n",
"\n",
"quant_conv_4b8b = QuantConv1d(\n",
" 2, 4, 3, bias=True, weight_bit_width=4,\n",
" input_quant=Int8ActPerTensorFloat,\n",
" output_quant=Int8ActPerTensorFloat,\n",
" bias_quant=Int16Bias)\n",
"\n",
"output_path = 'qop_onnx_conv_4b8b.onnx'\n",
"exported_model = export_onnx_qop(quant_conv_4b8b, input_t=float_inp, export_path=output_path)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"tags": [
"skip-execution"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Serving 'qop_onnx_conv_4b8b.onnx' at http://localhost:8082\n"
]
},
{
"data": {
"text/html": [
"\n",
" <iframe\n",
" width=\"100%\"\n",
" height=\"400\"\n",
" src=\"http://localhost:8082/\"\n",
" frameborder=\"0\"\n",
" allowfullscreen\n",
" \n",
" ></iframe>\n",
" "
],
"text/plain": [
"<IPython.lib.display.IFrame at 0x7f92ca3e1a10>"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"show_netron(output_path, 8082)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In general the standard ONNX opset doesn't support representing quantization below 8b. Additionally, ONNX QOp representation requires an output quantizer to be set at part of of the layer. \n",
"\n",
"The constraint of always having an output quantizer is relaxed in the more recently introduced QDQ style of representation (for which there is support in Brevitas starting from version 0.8), which uses only `QuantizeLinear` and `DequantizeLinear` to represent quantization, but even with that support is still limited to 8b quantization."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -2142,85 +2046,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The custom format shown above can integrated into ONNX-based toolchains, e.g. it's supported by our own FINN toolchain for low-precision dataflow style custom FPGAs implementations, and would be a starting point for direct integration with TVM.\n",
"\n",
"## Export to TorchScript quantization backend\n",
"\n",
"It's also possible to export to TorchScript own quantized functional operators, which come with their own set of restrictions. In particular, weights should be 7b and unsigned, which requires a zero-point. We can model that with appropriate quantizers:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
"from brevitas.quant import ShiftedUint8ActPerTensorFloat\n",
"from brevitas.export import export_torch_qop\n",
"\n",
"\n",
"quant_conv_8b7b = QuantConv1d(\n",
" 2, 4, 3, bias=True,\n",
" input_quant=ShiftedUint8ActPerTensorFloat,\n",
" output_quant=ShiftedUint8ActPerTensorFloat,\n",
" weight_bit_width=7,\n",
" bias_quant=Int16Bias)\n",
"\n",
"output_path = 'pytorch_qf_conv_8b7b.pt'\n",
"exported_model = export_torch_qop(quant_conv_8b7b, input_t=float_inp, export_path=output_path)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"tags": [
"skip-execution"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Serving 'pytorch_qf_conv_8b7b.pt' at http://localhost:8085\n"
]
},
{
"data": {
"text/html": [
"\n",
" <iframe\n",
" width=\"100%\"\n",
" height=\"400\"\n",
" src=\"http://localhost:8085/\"\n",
" frameborder=\"0\"\n",
" allowfullscreen\n",
" \n",
" ></iframe>\n",
" "
],
"text/plain": [
"<IPython.lib.display.IFrame at 0x7f92ca4a9550>"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"show_netron(output_path, 8085)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see though information on the fact that activations are 7b is lost, and they simply marked as 8b.\n",
"\n",
"Additionally, because bias quantization is not represented explicitly (although it is performed implicitly at 32b at runtime in the backend), any information around that is lost.\n",
"As with standard ONNX, representing precisions below 8b is not possible."
"The custom format shown above can integrated into ONNX-based toolchains, e.g. it's supported by our own FINN toolchain for low-precision dataflow style custom FPGAs implementations, and would be a starting point for direct integration with TVM."
]
},
{
Expand Down
Loading
Loading