Skip to content

Commit

Permalink
chore: add timing in data-frame notebook example (#583)
Browse files Browse the repository at this point in the history
  • Loading branch information
RomanBredehoft authored Apr 4, 2024
1 parent df3b5b6 commit 5f3c212
Show file tree
Hide file tree
Showing 3 changed files with 58 additions and 43 deletions.
10 changes: 5 additions & 5 deletions deps_licenses/licenses_mac_silicon_user.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@ Name, Version, License
GitPython, 3.1.41, BSD License
PyYAML, 6.0.1, MIT License
anyio, 3.7.1, MIT License
boto3, 1.34.72, Apache Software License
botocore, 1.34.72, Apache Software License
boto3, 1.34.75, Apache Software License
botocore, 1.34.75, Apache Software License
brevitas, 0.8.0, UNKNOWN
certifi, 2023.7.22, Mozilla Public License 2.0 (MPL 2.0)
charset-normalizer, 3.3.2, MIT License
click, 8.1.7, BSD License
coloredlogs, 15.0.1, MIT License
concrete-python, 2024.3.27, BSD-3-Clause
concrete-python, 2.6.0rc1, BSD-3-Clause
dependencies, 2.0.1, BSD License
dill, 0.3.8, BSD License
exceptiongroup, 1.2.0, MIT License
Expand All @@ -19,7 +19,7 @@ flatbuffers, 24.3.25, Apache Software License
fsspec, 2024.3.1, BSD License
gitdb, 4.0.11, BSD License
h11, 0.14.0, MIT License
huggingface-hub, 0.22.1, Apache Software License
huggingface-hub, 0.22.2, Apache Software License
humanfriendly, 10.0, MIT License
hummingbird-ml, 0.4.8, MIT License
idna, 3.6, BSD License
Expand Down Expand Up @@ -67,7 +67,7 @@ tokenizers, 0.15.2, Apache Software License
tomli, 2.0.1, MIT License
torch, 1.13.1, BSD License
tqdm, 4.66.2, MIT License; Mozilla Public License 2.0 (MPL 2.0)
transformers, 4.39.1, Apache Software License
transformers, 4.39.3, Apache Software License
typing_extensions, 4.5.0, Python Software Foundation License
tzdata, 2024.1, Apache Software License
urllib3, 2.2.1, MIT License
Expand Down
2 changes: 1 addition & 1 deletion deps_licenses/licenses_mac_silicon_user.txt.md5
Original file line number Diff line number Diff line change
@@ -1 +1 @@
8de2e8c13fe9a1fe80d9cce43dee7493
74a229e0dccc68a1f77c7ca59dbf7614
89 changes: 52 additions & 37 deletions docs/advanced_examples/EncryptedPandas.ipynb
Original file line number Diff line number Diff line change
@@ -1,12 +1,27 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Encrypted Data-frames\n",
"\n",
"The following notebook shows how to encrypt Pandas data-frames and run a left join on them using Fully Homomorphic Encryption (FHE) in a client-server setting using Concrete ML. This example is separated into three main sections : \n",
"1) Two independent clients load their own csv file using Pandas, encrypt their data and send them to a server\n",
"2) The server runs a left join in FHE\n",
"3) One of the client receives the encrypted output data-frame and decrypts it \n",
"\n",
"In such a setting, several parties are thus able to merge private databases without ever disclosing any of their sensitive data. Additionally, Concrete ML provides a user-friendly API meant to be as close as possible to Pandas. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import shutil\n",
"import time\n",
"from pathlib import Path\n",
"from tempfile import TemporaryDirectory\n",
"\n",
Expand All @@ -27,26 +42,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The following notebook shows how to encrypt Pandas data-frames and run a left join on them using Fully Homomorphic Encryption (FHE) in a client-server setting using Concrete ML. This example is separated into three main sections : \n",
"1) Two independent clients load their own csv file using Pandas, encrypt their data and send them to a server\n",
"2) The server runs a left join in FHE\n",
"3) One of the client receives the encrypted output data-frame and decrypts it \n",
"\n",
"In such a setting, several parties are thus able to merge private databases without ever disclosing any of their sensitive data. Additionally, Concrete ML provides a user-friendly API meant to be as close as possible to Pandas. "
"## Clients"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Clients"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## User 1\n",
"### User 1\n",
"\n",
"On the first user's side, load the private data using Pandas. For this example, we took the [Tips]( https://www.kaggle.com/code/sanjanabasu/tips-dataset/input) dataset and separated it into two csv files so that: \n",
"- all columns are different, except for column \"index\", representing the initial data-frame's index\n",
Expand Down Expand Up @@ -355,7 +358,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## User 2\n",
"### User 2\n",
"\n",
"The second user's steps are very similar to the first one. It is important to note that both users are expected not to share any of their data-base with each other."
]
Expand Down Expand Up @@ -489,31 +492,31 @@
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>..cd152fa311..</td>\n",
" <td>..3d8320b6c1..</td>\n",
" <td>..f82d07e52a..</td>\n",
" <td>..c4c4df551f..</td>\n",
" <td>..4c6228db5e..</td>\n",
" <td>..34201c3528..</td>\n",
" <td>..4b06cde26f..</td>\n",
" <td>..a8e057e092..</td>\n",
" </tr>\n",
" <tr>\n",
" <td>..7fe6768292..</td>\n",
" <td>..6f32ec3bc6..</td>\n",
" <td>..90c1062813..</td>\n",
" <td>..ce290bd1f5..</td>\n",
" <td>..128796dd3e..</td>\n",
" <td>..a8585d0f21..</td>\n",
" <td>..b5b5bb545f..</td>\n",
" <td>..c82afbda96..</td>\n",
" </tr>\n",
" <tr>\n",
" <td>..30c2a60054..</td>\n",
" <td>..09d03fc1e5..</td>\n",
" <td>..0092ded233..</td>\n",
" <td>..3e026fc954..</td>\n",
" <td>..7790a7620f..</td>\n",
" <td>..c3c49176fd..</td>\n",
" <td>..472743ea49..</td>\n",
" <td>..e9202edb1c..</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
"text/plain": [
" index day time size\n",
"..cd152fa311.. ..3d8320b6c1.. ..f82d07e52a.. ..c4c4df551f..\n",
"..7fe6768292.. ..6f32ec3bc6.. ..90c1062813.. ..ce290bd1f5..\n",
"..30c2a60054.. ..09d03fc1e5.. ..0092ded233.. ..3e026fc954.."
"..4c6228db5e.. ..34201c3528.. ..4b06cde26f.. ..a8e057e092..\n",
"..128796dd3e.. ..a8585d0f21.. ..b5b5bb545f.. ..c82afbda96..\n",
"..7790a7620f.. ..c3c49176fd.. ..472743ea49.. ..e9202edb1c.."
]
},
"execution_count": 9,
Expand Down Expand Up @@ -548,7 +551,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Server\n",
"## Server\n",
"\n",
"The server only receives serialized encrypted data-frames. Once it has them, anyone is able to decide which operation to run on which data-frames, but only the parties that encrypted them will be able to decrypt the result.\n",
"\n",
Expand Down Expand Up @@ -576,9 +579,21 @@
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total execution time: 8.26s\n"
]
}
],
"source": [
"df_joined_enc_server = df_left_enc.merge(df_right_enc, how=\"left\", on=\"index\")"
"start = time.time()\n",
"df_joined_enc_server = df_left_enc.merge(df_right_enc, how=\"left\", on=\"index\")\n",
"end = time.time() - start\n",
"\n",
"print(f\"Total execution time: {end:.2f}s\")"
]
},
{
Expand All @@ -603,7 +618,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Clients\n",
"## Clients\n",
"\n",
"Both user 1 and 2 are able to decrypt the server's encrypted output data-frame, but it first needs to be deserialized."
]
Expand Down Expand Up @@ -792,7 +807,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Concrete ML vs Pandas comparison\n",
"### Concrete ML vs Pandas comparison\n",
"\n",
"As this is only a demo in a notebook, we are able to compute Pandas' expected output (in a non-private setting) and compare it to the result above. "
]
Expand Down Expand Up @@ -1005,11 +1020,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Conclusion\n",
"## Conclusion\n",
"\n",
"Concrete ML provides a way for multiple parties to run Pandas operations on their data-frames without ever disclosing any sensitive data. This is done through a Pandas-like API that enables users to encrypt the data-frames and a server to run the operations in a private and secure manner using Fully Homomorphic Encryption (FHE). The users are then able to decrypt the output and obtain a result similar to what Pandas would have provided in a non-private setting. \n",
"\n",
"### Future Work\n",
"#### Future Work\n",
"\n",
"We are currently working on improving the encrypted data-frame feature. In the near future, we are planning on allowing bigger precisions, which would make encrypted data-frames able to handle larger integers, floating points with better precisions and more unique strings values, as well as provide more rows. We will also add support for more encrypted operations on data-frames. Additionally, we are working new techniques that would avoid users having to share a private keys between themselves. "
]
Expand Down

0 comments on commit 5f3c212

Please sign in to comment.