Skip to content

Commit

Permalink
DO_NOT_MERGE fake commit with all docs changes in Q3
Browse files Browse the repository at this point in the history
  • Loading branch information
aquint-zama committed Oct 9, 2023
1 parent 9afc962 commit 8b1932e
Show file tree
Hide file tree
Showing 71 changed files with 3,371 additions and 1,764 deletions.
28 changes: 28 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,34 @@ print(f"Similarity: {(y_pred_fhe == y_pred_clear).mean():.1%}")
# Similarity: 100.0%
```

It is also possible to call encryption, model prediction, and decryption functions separately as follows.
Executing these steps separately is equivalent to calling `predict_proba` on the model instance.

<!--pytest-codeblocks:cont-->

```python
# Predict probability for a single example
y_proba_fhe = model.predict_proba(X_test[[0]], fhe="execute")

# Quantize an original float input
q_input = model.quantize_input(X_test[[0]])

# Encrypt the input
q_input_enc = model.fhe_circuit.encrypt(q_input)

# Execute the linear product in FHE
q_y_enc = model.fhe_circuit.run(q_input_enc)

# Decrypt the result (integer)
q_y = model.fhe_circuit.decrypt(q_y_enc)

# De-quantize and post-process the result
y0 = model.post_processing(model.dequantize_output(q_y))

print("Probability with `predict_proba`: ", y_proba_fhe)
print("Probability with encrypt/run/decrypt calls: ", y0)
```

This example shows the typical flow of a Concrete ML model:

- The model is trained on unencrypted (plaintext) data using scikit-learn. As FHE operates over integers, Concrete ML quantizes the model to use only integers during inference.
Expand Down
14 changes: 11 additions & 3 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
- [Linear Models](built-in-models/linear.md)
- [Tree-based Models](built-in-models/tree.md)
- [Neural Networks](built-in-models/neural-networks.md)
- [Nearest Neighbors](built-in-models/nearest-neighbors.md)
- [Pandas](built-in-models/pandas.md)
- [Built-in Model Examples](built-in-models/ml_examples.md)

Expand All @@ -26,15 +27,19 @@
- [Debugging Models](deep-learning/fhe_assistant.md)
- [Optimizing Inference](deep-learning/optimizing_inference.md)

## Deployment

- [Prediction with FHE](advanced-topics/prediction_with_fhe.md)
- [Hybrid models](advanced-topics/hybrid-models.md)
- [Production Deployment](advanced-topics/client_server.md)
- [Serialization](advanced-topics/serialization.md)

## Advanced topics

- [Quantization](advanced-topics/quantization.md)
- [Pruning](advanced-topics/pruning.md)
- [Compilation](advanced-topics/compilation.md)
- [Prediction with FHE](advanced-topics/prediction_with_fhe.md)
- [Production Deployment](advanced-topics/client_server.md)
- [Advanced Features](advanced-topics/advanced_features.md)
- [Serialization](advanced-topics/serialization.md)

## Developer Guide

Expand Down Expand Up @@ -80,6 +85,7 @@
- [concrete.ml.quantization.md](developer-guide/api/concrete.ml.quantization.md)
- [concrete.ml.quantization.post_training.md](developer-guide/api/concrete.ml.quantization.post_training.md)
- [concrete.ml.quantization.quantized_module.md](developer-guide/api/concrete.ml.quantization.quantized_module.md)
- [concrete.ml.quantization.quantized_module_passes.md](developer-guide/api/concrete.ml.quantization.quantized_module_passes.md)
- [concrete.ml.quantization.quantized_ops.md](developer-guide/api/concrete.ml.quantization.quantized_ops.md)
- [concrete.ml.quantization.quantizers.md](developer-guide/api/concrete.ml.quantization.quantizers.md)
- [concrete.ml.search_parameters.md](developer-guide/api/concrete.ml.search_parameters.md)
Expand All @@ -88,6 +94,7 @@
- [concrete.ml.sklearn.glm.md](developer-guide/api/concrete.ml.sklearn.glm.md)
- [concrete.ml.sklearn.linear_model.md](developer-guide/api/concrete.ml.sklearn.linear_model.md)
- [concrete.ml.sklearn.md](developer-guide/api/concrete.ml.sklearn.md)
- [concrete.ml.sklearn.neighbors.md](developer-guide/api/concrete.ml.sklearn.neighbors.md)
- [concrete.ml.sklearn.qnn.md](developer-guide/api/concrete.ml.sklearn.qnn.md)
- [concrete.ml.sklearn.qnn_module.md](developer-guide/api/concrete.ml.sklearn.qnn_module.md)
- [concrete.ml.sklearn.rf.md](developer-guide/api/concrete.ml.sklearn.rf.md)
Expand All @@ -96,6 +103,7 @@
- [concrete.ml.sklearn.tree_to_numpy.md](developer-guide/api/concrete.ml.sklearn.tree_to_numpy.md)
- [concrete.ml.sklearn.xgb.md](developer-guide/api/concrete.ml.sklearn.xgb.md)
- [concrete.ml.torch.compile.md](developer-guide/api/concrete.ml.torch.compile.md)
- [concrete.ml.torch.hybrid_model.md](developer-guide/api/concrete.ml.torch.hybrid_model.md)
- [concrete.ml.torch.md](developer-guide/api/concrete.ml.torch.md)
- [concrete.ml.torch.numpy_module.md](developer-guide/api/concrete.ml.torch.numpy_module.md)
- [concrete.ml.version.md](developer-guide/api/concrete.ml.version.md)
Expand Down
26 changes: 13 additions & 13 deletions docs/advanced-topics/advanced_features.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,21 +26,21 @@ In Concrete ML, there are three different ways to define the error probability:

The first way to set error probabilities in Concrete ML is at the local level, by directly setting the probability of error of each individual TLU. This probability is referred to as `p_error`. A given PBS operation has a `1 - p_error` chance of being successful. The successful evaluation here means that the value decrypted after FHE evaluation is exactly the same as the one that would be computed in the clear.

For simplicity, it is best to use [default options](advanced_features.md#using-default-error-probability), irrespective of the type of model. Especially for deep neural networks, default values may be too pessimistic, reducing computation speed without any improvement in accuracy. For deep neural networks, some TLU errors might not affect the accuracy of the network, so the `p_error` can be safely increased (e.g., see CIFAR classifications in [our showcase](../getting-started/showcase.md)).
For simplicity, it is best to use [default options](advanced_features.md#using-default-error-probability), irrespective of the type of model. Especially for deep neural networks, default values may be too pessimistic, reducing computation speed without any improvement in accuracy. For deep neural networks, some TLU errors might not affect the accuracy of the network, so `p_error` can be safely increased (e.g., see CIFAR classifications in [our showcase](../getting-started/showcase.md)).

Here is a visualization of the effect of the `p_error` on a neural network model with a `p_error = 0.1` compared to execution in the clear (i.e., no error):

![Impact of p_error in a Neural Network](../figures/p_error_nn.png)

Varying the `p_error` in the one hidden-layer neural network above produces the following inference times. Increasing `p_error` to 0.1 halves the inference time with respect to a `p_error` of 0.001. In the graph above, the decision boundary becomes noisier with higher `p_error`.
Varying `p_error` in the one hidden-layer neural network above produces the following inference times. Increasing `p_error` to 0.1 halves the inference time with respect to a `p_error` of 0.001. In the graph above, the decision boundary becomes noisier with a higher `p_error`.

| p_error | Inference Time (ms) |
| :-----: | ------------------- |
| 0.001 | 0.80 |
| 0.01 | 0.41 |
| 0.1 | 0.37 |

The speedup depends on model complexity, but, in an iterative approach, it is possible to search for a good value of `p_error` to obtain a speedup while maintaining good accuracy. Concrete ML provides a tool to find a good `p_error` based on [binary search](advanced_features.md#searching-for-the-best-error-probability) algorithm.
The speedup depends on model complexity, but, in an iterative approach, it is possible to search for a good value of `p_error` to obtain a speedup while maintaining good accuracy. Concrete ML provides a tool to find a good value for `p_error` based on [binary search](advanced_features.md#searching-for-the-best-error-probability).

Users have the possibility to change this `p_error` by passing an argument to the `compile` function of any of the models. Here is an example:

Expand All @@ -61,7 +61,7 @@ clf.fit(X_train, y_train)
clf.compile(X_train, p_error=0.1)
```

If the `p_error` value is specified and the [simulation](compilation.md#fhe-simulation) is enabled, the run will take into account the randomness induced by the `p_error`, resulting in statistical similarity to the FHE evaluation.
If the `p_error` value is specified and [simulation](compilation.md#fhe-simulation) is enabled, the run will take into account the randomness induced by the choice of `p_error`. This results in statistical similarity to the FHE evaluation.

### A global error probability for the entire model

Expand All @@ -86,7 +86,7 @@ If neither `p_error` or `global_p_error` are set, Concrete ML employs `p_error =

### Searching for the best error probability

Currently finding a good `p_error` value _a-priori_ is not possible, as it is difficult to determine the impact of the TLU error on the output of a neural network. Concrete ML provides a tool to find a good `p_error` value that improves inference speed while maintaining accuracy. The method is based on binary search, evaluating the latency/accuracy trade-off iteratively.
Currently finding a good `p_error` value _a-priori_ is not possible, as it is difficult to determine the impact of the TLU error on the output of a neural network. Concrete ML provides a tool to find a good `p_error` value that improves inference speed while maintaining accuracy. The method is based on binary search and evaluates the latency/accuracy trade-off iteratively.

```python
from time import time
Expand Down Expand Up @@ -148,7 +148,7 @@ print(f"Accuracy = {accuracy_score(y_pred, y_train): .2%}")

With this optimal `p_error`, accuracy is maintained while execution time is improved by a factor of 1.51.

Please note that the default setting for the search interval is restricted to a range of 0.0 and 0.9. Increasing the upper bound beyond this range may result in longer execution times, especially when `p_error≈1`.
Please note that the default setting for the search interval is restricted to a range of 0.0 to 0.9. Increasing the upper bound beyond this range may result in longer execution times, especially when `p_error≈1`.

## Rounded activations and quantizers

Expand All @@ -172,7 +172,7 @@ In Concrete ML, this feature is currently implemented for custom neural networks
- `concrete.ml.torch.compile_onnx_model` and
- `concrete.ml.torch.compile_brevitas_qat_model`.

using `rounding_threshold_bits` argument can be set to a specific bit-width. It is important to choose an appropriate bit-width threshold to balance the trade-off between speed and accuracy. By reducing the bit-width of intermediate tensors, it is possible to speed-up computations while maintaining accuracy.
The `rounding_threshold_bits` argument can be set to a specific bit-width. It is important to choose an appropriate bit-width threshold to balance the trade-off between speed and accuracy. By reducing the bit-width of intermediate tensors, it is possible to speed-up computations while maintaining accuracy.

{% hint style="warning" %}
The `rounding_threshold_bits` parameter only works in FHE for TLU input bit-width ($$P$$) **less or equal to 8 bits**.
Expand All @@ -187,7 +187,7 @@ In practice, the process looks like this:
1. Update P = P - 1
1. repeat steps 2 and 3 until the accuracy loss is above a certain, acceptable threshold.

An example of such implementation is available in [evaluate_torch_cml.py](../../use_case_examples/cifar_brevitas_training/evaluate_one_example_fhe.py) and [CifarInFheWithSmallerAccumulators.ipynb](../../use_case_examples/cifar_brevitas_finetuning/CifarInFheWithSmallerAccumulators.ipynb)
An example of such implementation is available in [evaluate_torch_cml.py](../../use_case_examples/cifar/cifar_brevitas_training/evaluate_one_example_fhe.py) and [CifarInFheWithSmallerAccumulators.ipynb](../../use_case_examples/cifar/cifar_brevitas_finetuning/CifarInFheWithSmallerAccumulators.ipynb)

## Seeing compilation information

Expand Down Expand Up @@ -299,13 +299,13 @@ In this latter optimization, the following information will be provided:

- The bit-width ("6-bit integers") used in the program: for the moment, the compiler only supports a single precision (i.e., that all PBS are promoted to the same bit-width - the largest one). Therefore, this bit-width predominantly drives the speed of the program, and it is essential to reduce it as much as possible for faster execution.
- The maximal norm2 ("7 manp"), which has an impact on the crypto parameters: The larger this norm2, the slower PBS will be. The norm2 is related to the norm of some constants appearing in your program, in a way which will be clarified in the Concrete documentation.
- The probability of error of an individual PBS, which was requested by the user ("3.300000e-02 error per pbs call" in User Config)
- The probability of error of the full circuit, which was requested by the user ("1.000000e+00 error per circuit call" in User Config): Here, the probability 1 stands for "not used", since we had set the individual probability.
- The probability of error of an individual PBS, which is found by the optimizer ("1/30 errors (3.234529e-02)"
- The probability of error of the full circuit which is found by the optimizer ("1/10 errors (9.390887e-02)")
- The probability of error of an individual PBS, which was requested by the user ("3.300000e-02 error per pbs call" in User Config).
- The probability of error of the full circuit, which was requested by the user ("1.000000e+00 error per circuit call" in User Config). Here, the probability 1 stands for "not used", since we had set the individual probability via `p_error`.
- The probability of error of an individual PBS, which is found by the optimizer ("1/30 errors (3.234529e-02)").
- The probability of error of the full circuit which is found by the optimizer ("1/10 errors (9.390887e-02)").
- An estimation of the cost of the circuit ("4.214000e+02 Millions Operations"): Large values indicate a circuit that will execute more slowly.

Here is some further information about cryptographic parameters, for cryptographers only:
Here is some further information about cryptographic parameters:

- 1x glwe_dimension
- 2\*\*11 polynomial (2048)
Expand Down
14 changes: 6 additions & 8 deletions docs/advanced-topics/client_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,22 +31,20 @@ The server-side implementation of a Concrete ML model follows the diagram above.

## Example notebook

For a complete example, see [the client-server notebook](../advanced_examples/ClientServer.ipynb).
For a complete example, see [the client-server notebook](../advanced_examples/ClientServer.ipynb) or [the use-case examples](../../use_case_examples/deployment/).

### AWS

We provide scripts that leverage `boto3` to deploy any Concrete ML model to AWS.
The first required step is to properly set up AWS CLI on your system.
To do so please follow the instructions in [AWS Documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
The first required step is to properly set up AWS CLI on your system, which can be done by following the instructions in [AWS Documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
To create Access keys to configure AWS CLI, go to the [appropriate panel on AWS website](https://us-east-1.console.aws.amazon.com/iamv2/home?region=us-east-1#/security_credentials?section=IAM_credentials).

Once this first setup is done you can simply launch `python src/concrete/ml/deployment/deploy_to_aws.py --path-to-model <path_to_your_serialized_model>` from the root of the repository to create an instance that runs a FastAPI server serving the model.
Once this first setup is done you can launch `python src/concrete/ml/deployment/deploy_to_aws.py --path-to-model <path_to_your_serialized_model>` from the root of the repository to create an instance that runs a FastAPI server serving the model.

### Docker

Running Docker with the latest version of Concrete ML will require you to build a Docker image as we do for releases.
To do so run the following command: `poetry build && mkdir pkg && cp dist/* pkg/ && make release_docker`. You will need to have `make`, `poetry` and `docker` installed on your system.
To test locally there is a dedicated script: `python src/concrete/ml/deployment/deploy_to_docker.py --path-to-model <path_to_your_serialized_model>` from the root of the repository to create an Docker that runs a FastAPI server serving the model.
Running Docker with the latest version of Concrete ML will require you to build a Docker image. To do this, run the following command: `poetry build && mkdir pkg && cp dist/* pkg/ && make release_docker`. You will need to have `make`, `poetry` and `docker` installed on your system.
To test locally there is a dedicated script: `python src/concrete/ml/deployment/deploy_to_docker.py --path-to-model <path_to_your_serialized_model>` whoch should be run from the root of the repository in order to create a Docker that runs a FastAPI server serving the model.

There was no code required to run the server but each client is specific to the use-case, even if the workflow stays the same for all of them.
No code is required to run the server but each client is specific to the use-case, even if the workflow stays the same.
To see how to create your client refer to our [examples](../../use_case_examples/deployment) or [this notebook](../advanced_examples/Deployment.ipynb).
2 changes: 1 addition & 1 deletion docs/advanced-topics/compilation.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ The first step in the list above takes a Python function implemented using the C

The result of this single step of the compilation pipeline allows the:

- execution of the op-graph, which includes TLUs, on clear non-encrypted data. This is, of course, not secure, but it is much faster than executing in FHE. This mode is useful for debugging, especially when looking for appropriate model hyper-parameters
- execution of the op-graph, which includes TLUs, on clear non-encrypted data. This is not secure, but it is much faster than executing in FHE. This mode is useful for debugging, especially when looking for appropriate model hyper-parameters
- verification of the maximum bit-width of the op-graph and the intermediary bit-widths of model layers, to evaluate their impact on FHE execution latency

Simulation is enabled for all Concrete ML models once they are compiled as shown above. Obtaining the simulated predictions of the models is done by setting the `fhe="simulate"` argument to prediction methods:
Expand Down
Loading

0 comments on commit 8b1932e

Please sign in to comment.