fix: review changes

zama-ai · Jun 14, 2024 · 663c8c2 · 663c8c2
1 parent becad10
commit 663c8c2
Show file tree

Hide file tree

Showing 2 changed files with 30 additions and 14 deletions.
diff --git a/docs/deep-learning/fhe_assistant.md b/docs/deep-learning/fhe_assistant.md
@@ -59,21 +59,29 @@ The most common compilation errors stem from the following causes:
 
 #### 1. TLU input maximum bit-width is exceeded
 
-This error can occur when `rounding_threshold_bits` is not used and accumulated intermediate values in the computation exceed 16-bits. The most common approaches to fix this issue are:
+**Error message**: `this [N]-bit value is used as an input to a table lookup`
+
+**Cause**: This error can occur when `rounding_threshold_bits` is not used and accumulated intermediate values in the computation exceed 16-bits. The most common approaches to fix this issue are:
+
+**Possible solutions**:
 
 - Reduce quantization `n_bits`. However, this may reduce accuracy. When quantization `n_bits` must be below 6, it is best to use [Quantization Aware Training](../deep-learning/fhe_friendly_models.md).
 - Use `rounding_threshold_bits`. This feature is described [here](../explanations/advanced_features.md#rounded-activations-and-quantizers). It is recommended to use the [`fhe.Exactness.APPROXIMATE`](../references/api/concrete.ml.torch.compile.md#function-compile_torch_model) setting, and set the rounding bits to 1 or 2 bits higher than the quantization `n_bits`
 - Use [pruning](../explanations/pruning.md)
 
-#### 2. No crypto-parameters can be found for the ML model: `RuntimeError: NoParametersFound` is raised by the compiler
+#### 2. No crypto-parameters can be found
+
+**Error message**: `RuntimeError: NoParametersFound` is raised by the compiler
 
-This error occurs when using `rounding_threshold_bits` in the `compile_torch_model` function. The solutions in this case are similar to the ones for the previous error.
+**Cause**: This error occurs when using `rounding_threshold_bits` in the `compile_torch_model` function.
 
-#### 3. Quantization import failed with
+**Possible solutions**: The solutions in this case are similar to the ones for the previous error.
 
-The error associated is `Error occurred during quantization aware training (QAT) import [...] Could not determine a unique scale for the quantization!`.
+#### 3. Quantization import failed
 
-This error is a due to missing quantization operators in the model that is imported as a quantized aware training model. See [this guide](../deep-learning/fhe_friendly_models.md) for a guide on how to use Brevitas layers. This error message is generated when not all layers take inputs that are quantized through `QuantIdentity` layers.
+**Error message**: `Error occurred during quantization aware training (QAT) import [...] Could not determine a unique scale for the quantization!`.
+
+**Cause**: This error is a due to missing quantization operators in the model that is imported as a quantized aware training model. See [this guide](../deep-learning/fhe_friendly_models.md) for a guide on how to use Brevitas layers. This error message is generated when not all layers take inputs that are quantized through `QuantIdentity` layers.
 
 A common example is related to the concatenation operator. Suppose two tensors `x` and `y` are produced by two layers and need to be concatenated:
 
@@ -85,19 +93,19 @@ y = self.dense2(y)
 z = torch.cat([x, y])
 ```
 
-In the example above, the `x` and `y` layers need quantization before being concatenated. When using quantization aware training with Brevitas the following approach will fix this error:
+In the example above, the `x` and `y` layers need quantization before being concatenated.
+
+**Possible solutions**:
 
-1. Add a new `QuantIdentity` layer in your model. Suppose it is called `quant_concat`.
-1. In the `forward` function, before concatenation of `x` and `y`, apply it to both tensors that are concatenated:
+1. If the error occurs for the first layer of the model: Add a  `QuantIdentity` layer in your model and apply it on the input of the `forward` function, before the first layer is computed.
+1. If the error occurs for a concatenation or addition layer: Add a new `QuantIdentity` layer in your model. Suppose it is called `quant_concat`. In the `forward` function, before concatenation of `x` and `y`, apply it to both tensors that are concatenated. The usage of a common `Quantidentity` layer to quantize both tensors that are concatenated ensures that they have the same scale:
 
 <!--pytest-codeblocks:skip-->
 
 ```python
 z = torch.cat([self.quant_concat(x), self.quant_concat(y)])
 ```
 
-The usage of a common `Quantidentity` layer to quantize both tensors that are concatenated ensures that they have the same scale.
-
 ## Debugging compilation errors
 
 Compilation errors due to FHE incompatible models, such as maximum bit-width exceeded or `NoParametersFound` can be debugged by examining the bit-widths associated with various intermediate values of the FHE computation.

diff --git a/docs/deep-learning/torch_support.md b/docs/deep-learning/torch_support.md
@@ -5,7 +5,7 @@ In addition to the built-in models, Concrete ML supports generic machine learnin
 There are two approaches to build [FHE-compatible deep networks](../getting-started/concepts.md#model-accuracy-considerations-under-fhe-constraints):
 
 - [Quantization Aware Training (QAT)](../explanations/quantization.md) requires using custom layers, but can quantize weights and activations to low bit-widths. Concrete ML works with [Brevitas](../explanations/inner-workings/external_libraries.md#brevitas), a library providing QAT support for PyTorch. To use this mode, compile models using `compile_brevitas_qat_model`
-- Post-training Quantization: in this mode a vanilla PyTorch model can be compiled. However, when quantizing weights & activations to fewer than 7 bits the accuracy can decrease strongly. To use this mode, compile models with `compile_torch_model`.
+- Post-training Quantization: in this mode a vanilla PyTorch model can be compiled. However, when quantizing weights & activations to fewer than 7 bits the accuracy can decrease strongly. On the other hand, depending on the model size, quantizing with 6-8 bits can be incompatible with FHE constraints. To use this mode, compile models with `compile_torch_model`.
 
 Both approaches should be used with the `rounding_threshold_bits` parameter set accordingly. The best values for this parameter need to be determined through experimentation. A good initial value to try is `6`. See [here](../explanations/advanced_features.md#rounded-activations-and-quantizers) for more details.
 
@@ -15,7 +15,7 @@ Both approaches should be used with the `rounding_threshold_bits` parameter set
 
 ## Quantization-aware training
 
-The following example uses a simple QAT PyTorch model that implements a fully connected neural network with two hidden layers. Due to its small size, making this model respect FHE constraints is relatively easy.
+The following example uses a simple QAT PyTorch model that implements a fully connected neural network with two hidden layers. Due to its small size, making this model respect FHE constraints is relatively easy. To use QAT, Brevitas `QuantIdentity` nodes must be inserted in the PyTorch model, including one that quantizes the input of the `forward` function.
 
 ```python
 import brevitas.nn as qnn
@@ -63,6 +63,10 @@ quantized_module = compile_brevitas_qat_model(
 
 ```
 
+{% hint style="warning" %}
+If `QuantIdentity` layers are missing for any input or intermediate value, the compile function will raise an error. See the [common compilation errors page](./fhe_assistant.md#common-compilation-errors) for an explanation.
+{% endhint %}
+
 ## Post-training quantization
 
 The following example uses a simple PyTorch model that implements a fully connected neural network with two hidden layers. The model is compiled to use FHE using `compile_torch_model`.
@@ -103,7 +107,11 @@ quantized_module = compile_torch_model(
 
 ## Configuring quantization parameters
 
-The PyTorch/Brevitas models, created following the example above, require the user to configure quantization parameters such as `bit_width` (activation bit-width) and `weight_bit_width`. The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit-width of the network. Larger accumulator bit-widths result in higher accuracy but slower FHE inference time.
+With QAT, the PyTorch/Brevitas models, created following the example above, require the user to configure quantization parameters such as `bit_width` (activation bit-width) and `weight_bit_width`. When using this mode, set `n_bits=None` in the `compile_brevitas_qat_model`.
+
+With PTQ, the user needs to set the `n_bits` value in the `compile_torch_model` function. A trade-off between accuracy and FHE compatibility and latency must be determined manually.
+
+The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit-width of the network. Larger accumulator bit-widths result in higher accuracy but slower FHE inference time.
 
 ## Running encrypted inference