Skip to content

Commit

Permalink
fix: review changes
Browse files Browse the repository at this point in the history
  • Loading branch information
andrei-stoian-zama committed Jun 14, 2024
1 parent becad10 commit 663c8c2
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 14 deletions.
30 changes: 19 additions & 11 deletions docs/deep-learning/fhe_assistant.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,21 +59,29 @@ The most common compilation errors stem from the following causes:

#### 1. TLU input maximum bit-width is exceeded

This error can occur when `rounding_threshold_bits` is not used and accumulated intermediate values in the computation exceed 16-bits. The most common approaches to fix this issue are:
**Error message**: `this [N]-bit value is used as an input to a table lookup`

**Cause**: This error can occur when `rounding_threshold_bits` is not used and accumulated intermediate values in the computation exceed 16-bits. The most common approaches to fix this issue are:

**Possible solutions**:

- Reduce quantization `n_bits`. However, this may reduce accuracy. When quantization `n_bits` must be below 6, it is best to use [Quantization Aware Training](../deep-learning/fhe_friendly_models.md).
- Use `rounding_threshold_bits`. This feature is described [here](../explanations/advanced_features.md#rounded-activations-and-quantizers). It is recommended to use the [`fhe.Exactness.APPROXIMATE`](../references/api/concrete.ml.torch.compile.md#function-compile_torch_model) setting, and set the rounding bits to 1 or 2 bits higher than the quantization `n_bits`
- Use [pruning](../explanations/pruning.md)

#### 2. No crypto-parameters can be found for the ML model: `RuntimeError: NoParametersFound` is raised by the compiler
#### 2. No crypto-parameters can be found

**Error message**: `RuntimeError: NoParametersFound` is raised by the compiler

This error occurs when using `rounding_threshold_bits` in the `compile_torch_model` function. The solutions in this case are similar to the ones for the previous error.
**Cause**: This error occurs when using `rounding_threshold_bits` in the `compile_torch_model` function.

#### 3. Quantization import failed with
**Possible solutions**: The solutions in this case are similar to the ones for the previous error.

The error associated is `Error occurred during quantization aware training (QAT) import [...] Could not determine a unique scale for the quantization!`.
#### 3. Quantization import failed

This error is a due to missing quantization operators in the model that is imported as a quantized aware training model. See [this guide](../deep-learning/fhe_friendly_models.md) for a guide on how to use Brevitas layers. This error message is generated when not all layers take inputs that are quantized through `QuantIdentity` layers.
**Error message**: `Error occurred during quantization aware training (QAT) import [...] Could not determine a unique scale for the quantization!`.

**Cause**: This error is a due to missing quantization operators in the model that is imported as a quantized aware training model. See [this guide](../deep-learning/fhe_friendly_models.md) for a guide on how to use Brevitas layers. This error message is generated when not all layers take inputs that are quantized through `QuantIdentity` layers.

A common example is related to the concatenation operator. Suppose two tensors `x` and `y` are produced by two layers and need to be concatenated:

Expand All @@ -85,19 +93,19 @@ y = self.dense2(y)
z = torch.cat([x, y])
```

In the example above, the `x` and `y` layers need quantization before being concatenated. When using quantization aware training with Brevitas the following approach will fix this error:
In the example above, the `x` and `y` layers need quantization before being concatenated.

**Possible solutions**:

1. Add a new `QuantIdentity` layer in your model. Suppose it is called `quant_concat`.
1. In the `forward` function, before concatenation of `x` and `y`, apply it to both tensors that are concatenated:
1. If the error occurs for the first layer of the model: Add a `QuantIdentity` layer in your model and apply it on the input of the `forward` function, before the first layer is computed.
1. If the error occurs for a concatenation or addition layer: Add a new `QuantIdentity` layer in your model. Suppose it is called `quant_concat`. In the `forward` function, before concatenation of `x` and `y`, apply it to both tensors that are concatenated. The usage of a common `Quantidentity` layer to quantize both tensors that are concatenated ensures that they have the same scale:

<!--pytest-codeblocks:skip-->

```python
z = torch.cat([self.quant_concat(x), self.quant_concat(y)])
```

The usage of a common `Quantidentity` layer to quantize both tensors that are concatenated ensures that they have the same scale.

## Debugging compilation errors

Compilation errors due to FHE incompatible models, such as maximum bit-width exceeded or `NoParametersFound` can be debugged by examining the bit-widths associated with various intermediate values of the FHE computation.
Expand Down
14 changes: 11 additions & 3 deletions docs/deep-learning/torch_support.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ In addition to the built-in models, Concrete ML supports generic machine learnin
There are two approaches to build [FHE-compatible deep networks](../getting-started/concepts.md#model-accuracy-considerations-under-fhe-constraints):

- [Quantization Aware Training (QAT)](../explanations/quantization.md) requires using custom layers, but can quantize weights and activations to low bit-widths. Concrete ML works with [Brevitas](../explanations/inner-workings/external_libraries.md#brevitas), a library providing QAT support for PyTorch. To use this mode, compile models using `compile_brevitas_qat_model`
- Post-training Quantization: in this mode a vanilla PyTorch model can be compiled. However, when quantizing weights & activations to fewer than 7 bits the accuracy can decrease strongly. To use this mode, compile models with `compile_torch_model`.
- Post-training Quantization: in this mode a vanilla PyTorch model can be compiled. However, when quantizing weights & activations to fewer than 7 bits the accuracy can decrease strongly. On the other hand, depending on the model size, quantizing with 6-8 bits can be incompatible with FHE constraints. To use this mode, compile models with `compile_torch_model`.

Both approaches should be used with the `rounding_threshold_bits` parameter set accordingly. The best values for this parameter need to be determined through experimentation. A good initial value to try is `6`. See [here](../explanations/advanced_features.md#rounded-activations-and-quantizers) for more details.

Expand All @@ -15,7 +15,7 @@ Both approaches should be used with the `rounding_threshold_bits` parameter set

## Quantization-aware training

The following example uses a simple QAT PyTorch model that implements a fully connected neural network with two hidden layers. Due to its small size, making this model respect FHE constraints is relatively easy.
The following example uses a simple QAT PyTorch model that implements a fully connected neural network with two hidden layers. Due to its small size, making this model respect FHE constraints is relatively easy. To use QAT, Brevitas `QuantIdentity` nodes must be inserted in the PyTorch model, including one that quantizes the input of the `forward` function.

```python
import brevitas.nn as qnn
Expand Down Expand Up @@ -63,6 +63,10 @@ quantized_module = compile_brevitas_qat_model(

```

{% hint style="warning" %}
If `QuantIdentity` layers are missing for any input or intermediate value, the compile function will raise an error. See the [common compilation errors page](./fhe_assistant.md#common-compilation-errors) for an explanation.
{% endhint %}

## Post-training quantization

The following example uses a simple PyTorch model that implements a fully connected neural network with two hidden layers. The model is compiled to use FHE using `compile_torch_model`.
Expand Down Expand Up @@ -103,7 +107,11 @@ quantized_module = compile_torch_model(

## Configuring quantization parameters

The PyTorch/Brevitas models, created following the example above, require the user to configure quantization parameters such as `bit_width` (activation bit-width) and `weight_bit_width`. The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit-width of the network. Larger accumulator bit-widths result in higher accuracy but slower FHE inference time.
With QAT, the PyTorch/Brevitas models, created following the example above, require the user to configure quantization parameters such as `bit_width` (activation bit-width) and `weight_bit_width`. When using this mode, set `n_bits=None` in the `compile_brevitas_qat_model`.

With PTQ, the user needs to set the `n_bits` value in the `compile_torch_model` function. A trade-off between accuracy and FHE compatibility and latency must be determined manually.

The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit-width of the network. Larger accumulator bit-widths result in higher accuracy but slower FHE inference time.

## Running encrypted inference

Expand Down

0 comments on commit 663c8c2

Please sign in to comment.