From 18628fdc370e446c0ec7e6af95c9dbc76741c945 Mon Sep 17 00:00:00 2001 From: Andrei Stoian <95410270+andrei-stoian-zama@users.noreply.github.com> Date: Fri, 19 Apr 2024 15:16:59 +0200 Subject: [PATCH] chore: add paragraph about the security model (#623) --- docs/.gitbook/assets/jupyter_logo.png | Bin 2471 -> 0 bytes docs/README.md | 1 + docs/SUMMARY.md | 1 + docs/explanations/security_and_correctness.md | 19 +++++++++++++++ docs/index.toc.txt | 1 + docs/tutorials/dl_examples.md | 4 ++-- docs/tutorials/ml_examples.md | 22 +++++++++++------- 7 files changed, 38 insertions(+), 10 deletions(-) delete mode 100644 docs/.gitbook/assets/jupyter_logo.png create mode 100644 docs/explanations/security_and_correctness.md diff --git a/docs/.gitbook/assets/jupyter_logo.png b/docs/.gitbook/assets/jupyter_logo.png deleted file mode 100644 index 58623ebfbfd17f0ca8ded37018c740198f9e09f7..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 2471 zcmZ{mS5y<|634$Jbm<_5cIhPxfza!rNFWG=CWw?}6Ce=;5~WL#up&iK7C|9O6R8mq zb&+0HnrJ8jQ93Bn1tN&jx!K42aL=5X|C#ceGmn$*?1CLU@SU?oyRuZB)rlo+mn^ZC0KCXNb%^u2=wKl>wk61F+93QI-LSL;BfHPO} zn_Ns;4Guqh3>uvLYsIY<*{lj@#5t@r=R8Chs;O^%f%hcF~ns7Ah*k>-zO$1K_|pccb=73kq#D}A9cqe%n87JY*J$HP`C0tjF zNLmpwG%$z`kQE$5qq%9c^he)tg*krcItuDz;8m@Sg-6?fXBrhLmhCe*Nl6AP|lN zi|b!w|9~!HyJ}&!Qgd>DYcnxiITd1hh3o6;s;cmQx;MnZzr1IIG`o>-@UzaE-M5}J z9V1+#X!BN{ibVFAokGF)G|(d3m{kLw0JTu8J1B zq1KwrG_2O-xv;t^qz<_JOkvhWYmpbjzZ67xD~0Kk@}Dm6lPyYxnXat87;pud*rb%T z>jbn@N8M2G`a>TSo!@JRUWH(SXIebxJ=u==aW+!+QIJ5$9OsaDi7oqid9Ha#-adhN z$QUM6m%ki5f^m=-!>*W!bT6=7!utouycOv6d^{pwS!l3HYadogleRlh7*Topo0A@| zx@fWzuQ~+v+dId0e@D`xP!Q@12hl`yQ*;} z7Kyhkogk>x#s&}9+miKNeBY3@Bz(`tkWqq7@x7dY5tC9zw7|{%nW@8NRV6vO+>S>O z2hbtwg}>JLnS2DDozU;JOZiR)X3*BW;Zdf9yJfqK#{RpHjR*uyQy>Eh^)k+O2=58n znaS(;Pb!1^jn$iCnerT6UGucD5(UlBHaYZZiuuJ`lF>iK52q%G1&~Ft@Go5i0Uj~n z8kOLC=0~(9h~W8PUa#G8pTZ7G32hYE<9Z*GK22J&;KGAZd|!n_6@?rtwZwjk(nRv9x?_L0X$V z`XEg;5Z;qsU0tn}g3sr44tzCEOxE{P*Vz<3duRaX)ffJtf{qaB5VCe@8`8z@9vyem zLj*sFfr6%*c5d*nTpZ{Nx)=#1@XoCkyDAblluLp&a)T2xRl0=`3-X<)uqo0nU&I1l z{vE{G7CjCeFL-zqaTmBL2w|UBRaLF_EoIyXb5YH=tukj3Wg!n-9rMi-kMB1%8FM%< zhBY3WF6ZA(%cGkAQ;DZI5m_uzv2do?wWMyT+7yeYn(5Y-cN>|s{*EdE-N41Lv>r>JeRTLEiY+BS4 z?rTX(gS*Ys8=KZ)W9w1uz|CFH&x<>|?}rR{QkFB`*w|RkqGsM#q#|3wc2*gu0FFU( zf1e&mrO_C`+1pz(V39|D;?&$ajrJRP>~6A5$Wd z-NFD!nZ_`+b;z8|kN4F)NP<}c7Sce;;kXhHk1Nahlq}fpzgWj$Fz(>z$uPLU9D~U& zA}pK_ypFD)@F57VM^;*F(Vig67D2W}JmFi(ar*CB5)X-SOyz>09kYdD`Ek zz3v_`+hkrp*){xyx`n?$+^U!8mA7h*86)k%o7k)*0 zK(-dWi*Cx9v7<6`a4tTc+sM?N)6nbU`zyL|YXy4=Cw`?$74k&(*`ep+7e`?>gL$_l z`MW%Fhm@7$n&~pI-Kd5qxDtc0=f;?gXYE%tEt;jyE3cK-;Cq}D?4EHVyYB>Zxb3Zk zIZ&mNW2HJu(;fp#BIUarx0H@;z9Z6jQbGW?6tz~0qlj_g5NRXg7h}%AMlg#3gSXl? zoao6T=MBU4BaUD|V8>&UqmsKLk}}D?jN;eBbZKIaPF3U|J7X`Yk;)BA>#&#Eq|2V+~DQ%lsRX9a?fTG@eVVBy^#Gc`wssbbGpEcE_i_K=F|Ir^j;L! zP*U;+T1-g`q0`Y34({$q6_usSqY`=Yig5c6{^tD&L8mG6Y-SRp-GA7ulNy+HBk@2L ziNvBF^~)rS36OK4NgapSNBX~KE*zMcc=x0`^)b0Cf9?0 z;`Y93Ens4^`~HS?Xzs^{Khx(bHu@p{QcT=FpzANJ48Kc$+S~Xet$^lKX%uA@UdOdm z3!ACMTj>e8qqCmw#F)mVz^=HN__(;pr3dN(5uNtLmEkhj!*cdi81n?!CQ>pVUVEm; z`ec+MtS?3c;3Eje{$T``0Bw|(5fY_^)G~BKX&dVr80+b%p-{#sREsDO{vRNe91ujj W_WyzYa}6~t0GzXSMAw|f{rO*6J7ZD+ diff --git a/docs/README.md b/docs/README.md index 894b46db6..de2818ff6 100644 --- a/docs/README.md +++ b/docs/README.md @@ -37,6 +37,7 @@ Access to additional resources and join the Zama community. Refer to the API, review product architecture, and access additional resources for in-depth explanations while working with Concrete ML. +- [Security and correctness](explanations/security_and_correctness.md) - [API](references/api/README.md) - [Quantization](explanations/quantization.md) - [Pruning](explanations/pruning.md) diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index ef4e4d37a..8ab620656 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -46,6 +46,7 @@ ## Explanations +- [Security and correctness](explanations/security_and_correctness.md) - [Quantization](explanations/quantization.md) - [Pruning](explanations/pruning.md) - [Compilation](explanations/compilation.md) diff --git a/docs/explanations/security_and_correctness.md b/docs/explanations/security_and_correctness.md new file mode 100644 index 000000000..37ccf2306 --- /dev/null +++ b/docs/explanations/security_and_correctness.md @@ -0,0 +1,19 @@ +# Security and correctness + +## Security model + +The default parameters for Concrete ML are chosen considering the [IND-CPA](https://en.wikipedia.org/wiki/Ciphertext_indistinguishability) security model, and are selected with a [bootstrapping off-by-one error probability](../explanations/advanced_features.md#tolerance-to-off-by-one-error-for-an-individual-tlu) of $$2^-40$$. In particular, it is assumed that the results of decrypted computations are not shared by the secret key owner with any third parties, as such an action can lead to leakage of the secret encryption key. If you are designing an application where decryptions must be shared, you will need to craft custom encryption parameters which are chosen in consideration of the IND-CPA^D security model \[1\]. + +## Correctness of computations + +The [cryptography concepts](../getting-started/concepts.md#cryptography-concepts) section explains how Concrete ML can ensure **guaranteed correctness of encrypted computations**. In this approach, a quantized machine learning model will be converted to an FHE circuit that produces the same result on encrypted data as the original model on clear data. + +However, the [bootstrapping off-by-one error probability](../explanations/advanced_features.md#tolerance-to-off-by-one-error-for-an-individual-tlu) can be configured by the user. Raising this probability results in lower latency when executing on encrypted data, but higher values cancel the correctness guarantee of the default setting. In practice this may not be an issue, as the accuracy of the model may be maintained, even though slight differences are observed in the model outputs. Moreover, as noted in the [paragraph above](#security-model), raising the off-by-one error probability may negatively impact the security model. + +Furthermore, a second approach to reduce latency at the expense of correctness is approximate computation of univariate functions. This mode is enabled by using the [rounding setting](../explanations/advanced_features.md#rounded-activations-and-quantizers). When using the [`fhe.Exactness.APPROXIMATE`](../references/api/concrete.ml.torch.compile.md#function-compile_torch_model) rounding method, off-by-one errors are always induced in the computation of activation functions, irrespective of the bootstrapping off-by-one error probability. + +When trading-off better latency for correctness, it is highly recommended to use the [FHE simulation feature](../getting-started/concepts.md#i-model-development) to measure accuracy on a drawn-out test-set. In many cases the accuracy of the model is only slightly impacted by approximate computations. + +## References + +\[1\] Li, Baiyu, et al. “Securing approximate homomorphic encryption using differential privacy.” Annual International Cryptology Conference. Cham: Springer Nature Switzerland, 2022. https://eprint.iacr.org/2022/816.pdf diff --git a/docs/index.toc.txt b/docs/index.toc.txt index b90476ce9..b13ab4d4a 100644 --- a/docs/index.toc.txt +++ b/docs/index.toc.txt @@ -69,6 +69,7 @@ :hidden: :caption: Explanations + explanations/security_and_correctness.md explanations/quantization.md explanations/pruning.md explanations/compilation.md diff --git a/docs/tutorials/dl_examples.md b/docs/tutorials/dl_examples.md index 6ff3017db..600dd1a0b 100644 --- a/docs/tutorials/dl_examples.md +++ b/docs/tutorials/dl_examples.md @@ -12,12 +12,12 @@ Some examples constrain accumulators to 7-8 bits, which can be sufficient for si ### 1. Step-by-step guide to building a custom NN -[![](../.gitbook/assets/jupyter_logo.png) Quantization aware training example](../advanced_examples/QuantizationAwareTraining.ipynb) +- [Quantization aware training example](../advanced_examples/QuantizationAwareTraining.ipynb) This shows how to use Quantization Aware Training and pruning when starting out from a classical PyTorch network. This example uses a simple data-set and a small NN, which achieves good accuracy with low accumulator size. ### 2. Custom convolutional NN on the [Digits](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) data-set -[![](../.gitbook/assets/jupyter_logo.png) Convolutional Neural Network](../advanced_examples/ConvolutionalNeuralNetwork.ipynb) +- [Convolutional Neural Network](../advanced_examples/ConvolutionalNeuralNetwork.ipynb) Following the [Step-by-step guide](../deep-learning/fhe_friendly_models.md), this notebook implements a Quantization Aware Training convolutional neural network on the MNIST data-set. It uses 3-bit weights and activations, giving a 7-bit accumulator. diff --git a/docs/tutorials/ml_examples.md b/docs/tutorials/ml_examples.md index 0751fc613..1f695ff66 100644 --- a/docs/tutorials/ml_examples.md +++ b/docs/tutorials/ml_examples.md @@ -18,46 +18,52 @@ It is recommended to use [simulation](../explanations/compilation.md#fhe-simulat ### 1. Linear models -[![](../.gitbook/assets/jupyter_logo.png) Linear Regression example](../advanced_examples/LinearRegression.ipynb) [![](../.gitbook/assets/jupyter_logo.png) Logistic Regression example](../advanced_examples/LogisticRegression.ipynb) [![](../.gitbook/assets/jupyter_logo.png) Linear Support Vector Regression example](../advanced_examples/LinearSVR.ipynb) [![](../.gitbook/assets/jupyter_logo.png) Linear SVM classification](../advanced_examples/SVMClassifier.ipynb) +- [Linear Regression example](../advanced_examples/LinearRegression.ipynb) +- [Logistic Regression example](../advanced_examples/LogisticRegression.ipynb) +- [Linear Support Vector Regression example](../advanced_examples/LinearSVR.ipynb) +- [Linear SVM classification](../advanced_examples/SVMClassifier.ipynb) These examples show how to use the built-in linear models on synthetic data, which allows for easy visualization of the decision boundaries or trend lines. Executing these 1D and 2D models in FHE takes around 1 millisecond. ### 2. Generalized linear models -[![](../.gitbook/assets/jupyter_logo.png) Poisson Regression example](../advanced_examples/PoissonRegression.ipynb) [![](../.gitbook/assets/jupyter_logo.png) Generalized Linear Models comparison](../advanced_examples/GLMComparison.ipynb) +- [Poisson Regression example](../advanced_examples/PoissonRegression.ipynb) +- [Generalized Linear Models comparison](../advanced_examples/GLMComparison.ipynb) These two examples show generalized linear models (GLM) on the real-world [OpenML insurance](https://www.openml.org/d/41214) data-set. As the non-linear, inverse-link functions are computed, these models do not use [PBS](../getting-started/concepts.md#cryptography-concepts), and are, thus, very fast (~1ms execution time). ### 3. Decision tree -[![](../.gitbook/assets/jupyter_logo.png) Decision Tree Classifier](../advanced_examples/DecisionTreeClassifier.ipynb) +- [Decision Tree Classifier](../advanced_examples/DecisionTreeClassifier.ipynb) Using the [OpenML spams](https://www.openml.org/d/44) data-set, this example shows how to train a classifier that detects spam, based on features extracted from email messages. A grid-search is performed over decision-tree hyper-parameters to find the best ones. -[![](../.gitbook/assets/jupyter_logo.png) Decision Tree Regressor](../advanced_examples/DecisionTreeRegressor.ipynb) +- [Decision Tree Regressor](../advanced_examples/DecisionTreeRegressor.ipynb) Using the [House Price prediction](https://www.openml.org/search?type=data&sort=runs&id=537) data-set, this example shows how to train regressor that predicts house prices. ### 4. XGBoost and Random Forest classifier -[![](../.gitbook/assets/jupyter_logo.png) XGBoost/Random Forest example](../advanced_examples/XGBClassifier.ipynb) +- [XGBoost/Random Forest example](../advanced_examples/XGBClassifier.ipynb) This example shows how to train tree-ensemble models (either XGBoost or Random Forest), first on a synthetic data-set, and then on the [Diabetes](https://www.openml.org/d/37) data-set. Grid-search is used to find the best number of trees in the ensemble. ### 5. XGBoost regression -[![](../.gitbook/assets/jupyter_logo.png) XGBoost Regression example](../advanced_examples/XGBRegressor.ipynb) +- [XGBoost Regression example](../advanced_examples/XGBRegressor.ipynb) Privacy-preserving prediction of house prices is shown in this example, using the [House Prices](https://www.openml.org/d/43926) data-set. Using 50 trees in the ensemble, with 5 bits of precision for the input features, the FHE regressor obtains an $$R^2$$ score of 0.90 and an execution time of 7-8 seconds. ### 6. Fully connected neural network -[![](../.gitbook/assets/jupyter_logo.png) NN Iris example](../advanced_examples/FullyConnectedNeuralNetwork.ipynb) [![](../.gitbook/assets/jupyter_logo.png) NN MNIST example](../advanced_examples/FullyConnectedNeuralNetworkOnMNIST.ipynb) +- [NN Iris example](../advanced_examples/FullyConnectedNeuralNetwork.ipynb) +- [NN MNIST example](../advanced_examples/FullyConnectedNeuralNetworkOnMNIST.ipynb) Two different configurations of the built-in, fully-connected neural networks are shown. First, a small bit-width accumulator network is trained on [Iris](https://www.openml.org/d/61) and compared to a PyTorch floating point network. Second, a larger accumulator (>8 bits) is demonstrated on [MNIST](http://yann.lecun.com/exdb/mnist/). ### 7. Comparison of models -[![](../.gitbook/assets/jupyter_logo.png) Classifier comparison](../advanced_examples/ClassifierComparison.ipynb) [![](../.gitbook/assets/jupyter_logo.png) Regressor comparison](../advanced_examples/RegressorComparison.ipynb) +- [Classifier comparison](../advanced_examples/ClassifierComparison.ipynb) +- [Regressor comparison](../advanced_examples/RegressorComparison.ipynb) Based on three different synthetic data-sets, all the built-in classifiers are demonstrated in this notebook, showing accuracies, inference times, accumulator bit-widths, and decision boundaries.