DO_NOT_MERGE fake commit with all docs changes in Q3

zama-ai · Oct 9, 2023 · 8b1932e · 8b1932e
1 parent 9afc962
commit 8b1932e
Show file tree

Hide file tree

Showing 71 changed files with 3,371 additions and 1,764 deletions.
diff --git a/docs/README.md b/docs/README.md
@@ -48,6 +48,34 @@ print(f"Similarity: {(y_pred_fhe == y_pred_clear).mean():.1%}")
     # Similarity: 100.0%
 ```
 
+It is also possible to call encryption, model prediction, and decryption functions separately as follows.
+Executing these steps separately is equivalent to calling `predict_proba` on the model instance.
+
+<!--pytest-codeblocks:cont-->
+
+```python
+# Predict probability for a single example
+y_proba_fhe = model.predict_proba(X_test[[0]], fhe="execute")
+
+# Quantize an original float input
+q_input = model.quantize_input(X_test[[0]])
+
+# Encrypt the input
+q_input_enc = model.fhe_circuit.encrypt(q_input)
+
+# Execute the linear product in FHE 
+q_y_enc = model.fhe_circuit.run(q_input_enc)
+
+# Decrypt the result (integer)
+q_y = model.fhe_circuit.decrypt(q_y_enc)
+
+# De-quantize and post-process the result
+y0 = model.post_processing(model.dequantize_output(q_y))
+
+print("Probability with `predict_proba`: ", y_proba_fhe)
+print("Probability with encrypt/run/decrypt calls: ", y0)
+```
+
 This example shows the typical flow of a Concrete ML model:
 
 - The model is trained on unencrypted (plaintext) data using scikit-learn. As FHE operates over integers, Concrete ML quantizes the model to use only integers during inference.

diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
@@ -14,6 +14,7 @@
 - [Linear Models](built-in-models/linear.md)
 - [Tree-based Models](built-in-models/tree.md)
 - [Neural Networks](built-in-models/neural-networks.md)
+- [Nearest Neighbors](built-in-models/nearest-neighbors.md)
 - [Pandas](built-in-models/pandas.md)
 - [Built-in Model Examples](built-in-models/ml_examples.md)
 
@@ -26,15 +27,19 @@
 - [Debugging Models](deep-learning/fhe_assistant.md)
 - [Optimizing Inference](deep-learning/optimizing_inference.md)
 
+## Deployment
+
+- [Prediction with FHE](advanced-topics/prediction_with_fhe.md)
+- [Hybrid models](advanced-topics/hybrid-models.md)
+- [Production Deployment](advanced-topics/client_server.md)
+- [Serialization](advanced-topics/serialization.md)
+
 ## Advanced topics
 
 - [Quantization](advanced-topics/quantization.md)
 - [Pruning](advanced-topics/pruning.md)
 - [Compilation](advanced-topics/compilation.md)
-- [Prediction with FHE](advanced-topics/prediction_with_fhe.md)
-- [Production Deployment](advanced-topics/client_server.md)
 - [Advanced Features](advanced-topics/advanced_features.md)
-- [Serialization](advanced-topics/serialization.md)
 
 ## Developer Guide
 
@@ -80,6 +85,7 @@
   - [concrete.ml.quantization.md](developer-guide/api/concrete.ml.quantization.md)
   - [concrete.ml.quantization.post_training.md](developer-guide/api/concrete.ml.quantization.post_training.md)
   - [concrete.ml.quantization.quantized_module.md](developer-guide/api/concrete.ml.quantization.quantized_module.md)
+  - [concrete.ml.quantization.quantized_module_passes.md](developer-guide/api/concrete.ml.quantization.quantized_module_passes.md)
   - [concrete.ml.quantization.quantized_ops.md](developer-guide/api/concrete.ml.quantization.quantized_ops.md)
   - [concrete.ml.quantization.quantizers.md](developer-guide/api/concrete.ml.quantization.quantizers.md)
   - [concrete.ml.search_parameters.md](developer-guide/api/concrete.ml.search_parameters.md)
@@ -88,6 +94,7 @@
   - [concrete.ml.sklearn.glm.md](developer-guide/api/concrete.ml.sklearn.glm.md)
   - [concrete.ml.sklearn.linear_model.md](developer-guide/api/concrete.ml.sklearn.linear_model.md)
   - [concrete.ml.sklearn.md](developer-guide/api/concrete.ml.sklearn.md)
+  - [concrete.ml.sklearn.neighbors.md](developer-guide/api/concrete.ml.sklearn.neighbors.md)
   - [concrete.ml.sklearn.qnn.md](developer-guide/api/concrete.ml.sklearn.qnn.md)
   - [concrete.ml.sklearn.qnn_module.md](developer-guide/api/concrete.ml.sklearn.qnn_module.md)
   - [concrete.ml.sklearn.rf.md](developer-guide/api/concrete.ml.sklearn.rf.md)
@@ -96,6 +103,7 @@
   - [concrete.ml.sklearn.tree_to_numpy.md](developer-guide/api/concrete.ml.sklearn.tree_to_numpy.md)
   - [concrete.ml.sklearn.xgb.md](developer-guide/api/concrete.ml.sklearn.xgb.md)
   - [concrete.ml.torch.compile.md](developer-guide/api/concrete.ml.torch.compile.md)
+  - [concrete.ml.torch.hybrid_model.md](developer-guide/api/concrete.ml.torch.hybrid_model.md)
   - [concrete.ml.torch.md](developer-guide/api/concrete.ml.torch.md)
   - [concrete.ml.torch.numpy_module.md](developer-guide/api/concrete.ml.torch.numpy_module.md)
   - [concrete.ml.version.md](developer-guide/api/concrete.ml.version.md)

diff --git a/docs/advanced-topics/advanced_features.md b/docs/advanced-topics/advanced_features.md
@@ -26,21 +26,21 @@ In Concrete ML, there are three different ways to define the error probability:
 
 The first way to set error probabilities in Concrete ML is at the local level, by directly setting the probability of error of each individual TLU. This probability is referred to as `p_error`. A given PBS operation has a `1 - p_error` chance of being successful. The successful evaluation here means that the value decrypted after FHE evaluation is exactly the same as the one that would be computed in the clear.
 
-For simplicity, it is best to use [default options](advanced_features.md#using-default-error-probability), irrespective of the type of model. Especially for deep neural networks, default values may be too pessimistic, reducing computation speed without any improvement in accuracy. For deep neural networks, some TLU errors might not affect the accuracy of the network, so the `p_error` can be safely increased (e.g., see CIFAR classifications in [our showcase](../getting-started/showcase.md)).
+For simplicity, it is best to use [default options](advanced_features.md#using-default-error-probability), irrespective of the type of model. Especially for deep neural networks, default values may be too pessimistic, reducing computation speed without any improvement in accuracy. For deep neural networks, some TLU errors might not affect the accuracy of the network, so `p_error` can be safely increased (e.g., see CIFAR classifications in [our showcase](../getting-started/showcase.md)).
 
 Here is a visualization of the effect of the `p_error` on a neural network model with a `p_error = 0.1` compared to execution in the clear (i.e., no error):
 
 ![Impact of p_error in a Neural Network](../figures/p_error_nn.png)
 
-Varying the `p_error` in the one hidden-layer neural network above produces the following inference times. Increasing `p_error` to 0.1 halves the inference time with respect to a `p_error` of 0.001. In the graph above, the decision boundary becomes noisier with higher `p_error`.
+Varying `p_error` in the one hidden-layer neural network above produces the following inference times. Increasing `p_error` to 0.1 halves the inference time with respect to a `p_error` of 0.001. In the graph above, the decision boundary becomes noisier with a higher `p_error`.
 
 | p_error | Inference Time (ms) |
 | :-----: | ------------------- |
 |  0.001  | 0.80                |
 |  0.01   | 0.41                |
 |   0.1   | 0.37                |
 
-The speedup depends on model complexity, but, in an iterative approach, it is possible to search for a good value of `p_error` to obtain a speedup while maintaining good accuracy. Concrete ML provides a tool to find a good `p_error` based on [binary search](advanced_features.md#searching-for-the-best-error-probability) algorithm.
+The speedup depends on model complexity, but, in an iterative approach, it is possible to search for a good value of `p_error` to obtain a speedup while maintaining good accuracy. Concrete ML provides a tool to find a good value for `p_error` based on [binary search](advanced_features.md#searching-for-the-best-error-probability).
 
 Users have the possibility to change this `p_error` by passing an argument to the `compile` function of any of the models. Here is an example:
 
@@ -61,7 +61,7 @@ clf.fit(X_train, y_train)
 clf.compile(X_train, p_error=0.1)
 ```
 
-If the `p_error` value is specified and the [simulation](compilation.md#fhe-simulation) is enabled, the run will take into account the randomness induced by the `p_error`, resulting in statistical similarity to the FHE evaluation.
+If the `p_error` value is specified and [simulation](compilation.md#fhe-simulation) is enabled, the run will take into account the randomness induced by the choice of `p_error`. This results in statistical similarity to the FHE evaluation.
 
 ### A global error probability for the entire model
 
@@ -86,7 +86,7 @@ If neither `p_error` or `global_p_error` are set, Concrete ML employs `p_error =
 
 ### Searching for the best error probability
 
-Currently finding a good `p_error` value _a-priori_ is not possible, as it is difficult to determine the impact of the TLU error on the output of a neural network. Concrete ML provides a tool to find a good `p_error` value that improves inference speed while maintaining accuracy. The method is based on binary search, evaluating the latency/accuracy trade-off iteratively.
+Currently finding a good `p_error` value _a-priori_ is not possible, as it is difficult to determine the impact of the TLU error on the output of a neural network. Concrete ML provides a tool to find a good `p_error` value that improves inference speed while maintaining accuracy. The method is based on binary search and evaluates the latency/accuracy trade-off iteratively.
 
 ```python
 from time import time
@@ -148,7 +148,7 @@ print(f"Accuracy = {accuracy_score(y_pred, y_train): .2%}")
 
 With this optimal `p_error`, accuracy is maintained while execution time is improved by a factor of 1.51.
 
-Please note that the default setting for the search interval is restricted to a range of 0.0 and 0.9. Increasing the upper bound beyond this range may result in longer execution times, especially when `p_error≈1`.
+Please note that the default setting for the search interval is restricted to a range of 0.0 to 0.9. Increasing the upper bound beyond this range may result in longer execution times, especially when `p_error≈1`.
 
 ## Rounded activations and quantizers
 
@@ -172,7 +172,7 @@ In Concrete ML, this feature is currently implemented for custom neural networks
 - `concrete.ml.torch.compile_onnx_model` and
 - `concrete.ml.torch.compile_brevitas_qat_model`.
 
-using `rounding_threshold_bits` argument can be set to a specific bit-width. It is important to choose an appropriate bit-width threshold to balance the trade-off between speed and accuracy. By reducing the bit-width of intermediate tensors, it is possible to speed-up computations while maintaining accuracy.
+The `rounding_threshold_bits` argument can be set to a specific bit-width. It is important to choose an appropriate bit-width threshold to balance the trade-off between speed and accuracy. By reducing the bit-width of intermediate tensors, it is possible to speed-up computations while maintaining accuracy.
 
 {% hint style="warning" %}
 The `rounding_threshold_bits` parameter only works in FHE for TLU input bit-width ($$P$$) **less or equal to 8 bits**.
@@ -187,7 +187,7 @@ In practice, the process looks like this:
 1. Update P = P - 1
 1. repeat steps 2 and 3 until the accuracy loss is above a certain, acceptable threshold.
 
-An example of such implementation is available in [evaluate_torch_cml.py](../../use_case_examples/cifar_brevitas_training/evaluate_one_example_fhe.py) and [CifarInFheWithSmallerAccumulators.ipynb](../../use_case_examples/cifar_brevitas_finetuning/CifarInFheWithSmallerAccumulators.ipynb)
+An example of such implementation is available in [evaluate_torch_cml.py](../../use_case_examples/cifar/cifar_brevitas_training/evaluate_one_example_fhe.py) and [CifarInFheWithSmallerAccumulators.ipynb](../../use_case_examples/cifar/cifar_brevitas_finetuning/CifarInFheWithSmallerAccumulators.ipynb)
 
 ## Seeing compilation information
 
@@ -299,13 +299,13 @@ In this latter optimization, the following information will be provided:
 
 - The bit-width ("6-bit integers") used in the program: for the moment, the compiler only supports a single precision (i.e., that all PBS are promoted to the same bit-width - the largest one). Therefore, this bit-width predominantly drives the speed of the program, and it is essential to reduce it as much as possible for faster execution.
 - The maximal norm2 ("7 manp"), which has an impact on the crypto parameters: The larger this norm2, the slower PBS will be. The norm2 is related to the norm of some constants appearing in your program, in a way which will be clarified in the Concrete documentation.
-- The probability of error of an individual PBS, which was requested by the user ("3.300000e-02 error per pbs call" in User Config)
-- The probability of error of the full circuit, which was requested by the user ("1.000000e+00 error per circuit call" in User Config): Here, the probability 1 stands for "not used", since we had set the individual probability.
-- The probability of error of an individual PBS, which is found by the optimizer ("1/30 errors (3.234529e-02)"
-- The probability of error of the full circuit which is found by the optimizer ("1/10 errors (9.390887e-02)")
+- The probability of error of an individual PBS, which was requested by the user ("3.300000e-02 error per pbs call" in User Config).
+- The probability of error of the full circuit, which was requested by the user ("1.000000e+00 error per circuit call" in User Config). Here, the probability 1 stands for "not used", since we had set the individual probability via `p_error`.
+- The probability of error of an individual PBS, which is found by the optimizer ("1/30 errors (3.234529e-02)").
+- The probability of error of the full circuit which is found by the optimizer ("1/10 errors (9.390887e-02)").
 - An estimation of the cost of the circuit ("4.214000e+02 Millions Operations"): Large values indicate a circuit that will execute more slowly.
 
-Here is some further information about cryptographic parameters, for cryptographers only:
+Here is some further information about cryptographic parameters:
 
 - 1x glwe_dimension
 - 2\*\*11 polynomial (2048)

diff --git a/docs/advanced-topics/client_server.md b/docs/advanced-topics/client_server.md
@@ -31,22 +31,20 @@ The server-side implementation of a Concrete ML model follows the diagram above.
 
 ## Example notebook
 
-For a complete example, see [the client-server notebook](../advanced_examples/ClientServer.ipynb).
+For a complete example, see [the client-server notebook](../advanced_examples/ClientServer.ipynb) or [the use-case examples](../../use_case_examples/deployment/).
 
 ### AWS
 
 We provide scripts that leverage `boto3` to deploy any Concrete ML model to AWS.
-The first required step is to properly set up AWS CLI on your system.
-To do so please follow the instructions in [AWS Documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
+The first required step is to properly set up AWS CLI on your system, which can be done by following the instructions in [AWS Documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
 To create Access keys to configure AWS CLI, go to the [appropriate panel on AWS website](https://us-east-1.console.aws.amazon.com/iamv2/home?region=us-east-1#/security_credentials?section=IAM_credentials).
 
-Once this first setup is done you can simply launch `python src/concrete/ml/deployment/deploy_to_aws.py --path-to-model <path_to_your_serialized_model>` from the root of the repository to create an instance that runs a FastAPI server serving the model.
+Once this first setup is done you can launch `python src/concrete/ml/deployment/deploy_to_aws.py --path-to-model <path_to_your_serialized_model>` from the root of the repository to create an instance that runs a FastAPI server serving the model.
 
 ### Docker
 
-Running Docker with the latest version of Concrete ML will require you to build a Docker image as we do for releases.
-To do so run the following command: `poetry build && mkdir pkg && cp dist/* pkg/ && make release_docker`. You will need to have `make`, `poetry` and `docker` installed on your system.
-To test locally there is a dedicated script: `python src/concrete/ml/deployment/deploy_to_docker.py --path-to-model <path_to_your_serialized_model>` from the root of the repository to create an Docker that runs a FastAPI server serving the model.
+Running Docker with the latest version of Concrete ML will require you to build a Docker image. To do this, run the following command: `poetry build && mkdir pkg && cp dist/* pkg/ && make release_docker`. You will need to have `make`, `poetry` and `docker` installed on your system.
+To test locally there is a dedicated script: `python src/concrete/ml/deployment/deploy_to_docker.py --path-to-model <path_to_your_serialized_model>` whoch should be run from the root of the repository in order to create a Docker that runs a FastAPI server serving the model.
 
-There was no code required to run the server but each client is specific to the use-case, even if the workflow stays the same for all of them.
+No code is required to run the server but each client is specific to the use-case, even if the workflow stays the same.
 To see how to create your client refer to our [examples](../../use_case_examples/deployment) or [this notebook](../advanced_examples/Deployment.ipynb).
diff --git a/docs/advanced-topics/compilation.md b/docs/advanced-topics/compilation.md
@@ -83,7 +83,7 @@ The first step in the list above takes a Python function implemented using the C
 
 The result of this single step of the compilation pipeline allows the:
 
-- execution of the op-graph, which includes TLUs, on clear non-encrypted data. This is, of course, not secure, but it is much faster than executing in FHE. This mode is useful for debugging, especially when looking for appropriate model hyper-parameters
+- execution of the op-graph, which includes TLUs, on clear non-encrypted data. This is not secure, but it is much faster than executing in FHE. This mode is useful for debugging, especially when looking for appropriate model hyper-parameters
 - verification of the maximum bit-width of the op-graph and the intermediary bit-widths of model layers, to evaluate their impact on FHE execution latency
 
 Simulation is enabled for all Concrete ML models once they are compiled as shown above. Obtaining the simulated predictions of the models is done by setting the `fhe="simulate"` argument to prediction methods: