From c773020abe8111dc56b09485e3ce2eee4fd443b3 Mon Sep 17 00:00:00 2001
From: PaulineMauryL <pauline.maury-laribiere@bfs.admin.ch>
Date: Wed, 4 Sep 2024 13:19:20 +0000
Subject: [PATCH 01/12] start to modify notebook

---
 .../Demo_Client_Notebook_Smartnoise-SQL.ipynb | 611 +++---------------
 1 file changed, 79 insertions(+), 532 deletions(-)

diff --git a/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb b/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
index 33d0c68b..43159ef6 100644
--- a/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
+++ b/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
@@ -5,7 +5,7 @@
    "id": "3f18d338",
    "metadata": {},
    "source": [
-    "# Secure Data Disclosure: Client side"
+    "# Lomas Client Side: Using Smartnoise-SQL"
    ]
   },
   {
@@ -13,7 +13,7 @@
    "id": "1582a2ae",
    "metadata": {},
    "source": [
-    "This notebook showcases how researcher could use the Secure Data Disclosure system. It explains the different functionnalities provided by the dpserial client library to interact with the secure server.\n",
+    "This notebook showcases how researcher could use lomas platform with Smartnoise-SQL. It explains the different functionnalities provided by the `lomas-client` client library to interact with lomas server.\n",
     "\n",
     "The secure data are never visible by researchers. They can only access to differentially private responses via queries to the server.\n",
     "\n",
@@ -25,13 +25,7 @@
    "id": "5b73135c",
    "metadata": {},
    "source": [
-    "🐧🐧🐧\n",
-    "In this notebook the researcher is a penguin researcher named Dr. Antarctica. She aims to do a grounbdbreaking research on various penguins dimensions.\n",
-    "\n",
-    "Therefore, the powerful queen Icerbegina 👑 had the data collected. But in order to get the penguins to agree to participate she promised them that no one would be able to look at the data and that no one would be able to guess the bill width of any specific penguin (which is very sensitive information) from the data. Nobody! Not even the researchers. The queen hence stored the data on the Secure Data Disclosure Server and only gave a small budget to Dr. Antarctica.\n",
-    "\n",
-    "This is not a problem for Dr. Antarctica as she does not need to see the data to make statistics thanks to the Secure Data Disclosure Client library ofs_dpserial. \n",
-    "🐧🐧🐧"
+    "In this notebook the researcher is a penguin researcher named Dr. Antarctica. She aims to do a grounbdbreaking research on various penguins data."
    ]
   },
   {
@@ -40,11 +34,29 @@
    "metadata": {},
    "source": [
     "## Step 1: Install the library\n",
-    "To interact with the secure server on which the data is stored, Dr.Antartica first needs to install the library `fso_dpserial` on her local developping environment. \n",
+    "To interact with the secure server on which the data is stored, Dr.Antartica first needs to install the library `lomas-client` on her local developping environment. \n",
     "\n",
     "It can be installed via the pip command:"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6f5d749c-0f39-4f78-8157-528bc39764b2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# !pip install lomas_client"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53cf3204-18a8-423c-9de2-c2966fdf84fb",
+   "metadata": {},
+   "source": [
+    "Or using a local version of the clien"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -54,8 +66,7 @@
    "source": [
     "import sys\n",
     "import os\n",
-    "sys.path.append(os.path.abspath(os.path.join('..')))\n",
-    "# !pip install lomas_client"
+    "sys.path.append(os.path.abspath(os.path.join('..')))"
    ]
   },
   {
@@ -83,7 +94,7 @@
     "- user_name: her name as registered in the database (Dr. Alice Antartica)\n",
     "- dataset_name: the name of the dataset that she wants to query (PENGUIN)\n",
     "\n",
-    "She will only be able to query on the real dataset if the queen Icergina has previously made her an account in the database, given her access to the PENGUIN dataset and has given her some epsilon and delta credit. (As is done in the Secure Data Disclosure Notebook: Server side)."
+    "She will only be able to query on the real dataset if the administratir has previously made her an account in the database, given her access to the PENGUIN dataset and has given her some $\\epsilon$, $\\delta$ privacy loss budget."
    ]
   },
   {
@@ -104,7 +115,7 @@
    "id": "0ec400c8",
    "metadata": {},
    "source": [
-    "And that's it for the preparation. She is now ready to use the various functionnalities offered by `fso_dpserial`."
+    "And that's it for the preparation. She is now ready to use the various functionnalities offered by `lomas-client`."
    ]
   },
   {
@@ -112,24 +123,12 @@
    "id": "9b9a5f13",
    "metadata": {},
    "source": [
-    "## Step 3: Understand the functionnalities of the library"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c7cb5531",
-   "metadata": {},
-   "source": [
-    "### Getting dataset metadata\n",
-    "\n",
-    "Dr. Antartica has never seen the data and as a first step to understand what is available to her, she would like to check the metadata of the dataset. Therefore, she just needs to call the `get_dataset_metadata()` function of the client. As this is public information, this does not cost any budget.\n",
-    "\n",
-    "This function returns metadata information in the same format as [SmartnoiseSQL dictionary format](https://docs.smartnoise.org/sql/metadata.html#dictionary-format), where among other, there is information about all the available columns, their type, bound values (see Smartnoise page for more details)."
+    "## Step 3: Getting dataset metadata"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 6,
    "id": "d15cbe39",
    "metadata": {},
    "outputs": [
@@ -155,7 +154,7 @@
        " 'rows': 344}"
       ]
      },
-     "execution_count": 4,
+     "execution_count": 6,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -167,294 +166,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d338ed96",
-   "metadata": {},
-   "source": [
-    "Based on this Dr. Antartica knows that there are 7 columns, 3 of string type (species, island, sex) and 4 of float type (bill length, bill depth, flipper length and body mass) with their associated bounds. She also knows based on the field `max_ids: 1` that each penguin can only be once in the dataset and on the field `row_privacy: True` that each row represents a single penguin. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5a3c899d",
+   "id": "5bf5b471-1495-4046-bec1-ddf96c98642f",
    "metadata": {},
    "source": [
-    "### Get a dummy dataset\n",
-    "\n",
-    "Now, that she has seen and understood the metadata, she wants to get an even better understanding of the dataset (but is still not able to see it). A solution to have an idea of what the dataset looks like it to create a dummy dataset. \n",
-    "\n",
-    "Based on the public metadata of the dataset, a random dataframe can be created created. By default, there will be 100 rows and the seed is set to 42 to ensure reproducibility, but these 2 variables can be changed to obtain different dummy datasets.\n",
-    "Getting a dummy dataset does not affect the budget as there is no differential privacy here, it is not a synthetic dataset and all that could be learn here is already present in the public metadata.\n",
-    "\n",
-    "Dr. Antartica first create a dummy dataset with the default options."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "be07091f",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(100, 7)\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>species</th>\n",
-       "      <th>island</th>\n",
-       "      <th>bill_length_mm</th>\n",
-       "      <th>bill_depth_mm</th>\n",
-       "      <th>flipper_length_mm</th>\n",
-       "      <th>body_mass_g</th>\n",
-       "      <th>sex</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>Chinstrap</td>\n",
-       "      <td>Torgersen</td>\n",
-       "      <td>43.108904</td>\n",
-       "      <td>13.314292</td>\n",
-       "      <td>214.203165</td>\n",
-       "      <td>2258.408606</td>\n",
-       "      <td>FEMALE</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>Adelie</td>\n",
-       "      <td>Dream</td>\n",
-       "      <td>63.275001</td>\n",
-       "      <td>19.364104</td>\n",
-       "      <td>158.413996</td>\n",
-       "      <td>4656.773158</td>\n",
-       "      <td>FEMALE</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>Adelie</td>\n",
-       "      <td>Dream</td>\n",
-       "      <td>55.619788</td>\n",
-       "      <td>16.143560</td>\n",
-       "      <td>166.162871</td>\n",
-       "      <td>4703.175608</td>\n",
-       "      <td>FEMALE</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>Adelie</td>\n",
-       "      <td>Biscoe</td>\n",
-       "      <td>50.953047</td>\n",
-       "      <td>18.085707</td>\n",
-       "      <td>239.855419</td>\n",
-       "      <td>5187.149507</td>\n",
-       "      <td>MALE</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>Gentoo</td>\n",
-       "      <td>Torgersen</td>\n",
-       "      <td>35.460652</td>\n",
-       "      <td>22.075665</td>\n",
-       "      <td>210.642906</td>\n",
-       "      <td>5630.456669</td>\n",
-       "      <td>MALE</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "     species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
-       "0  Chinstrap  Torgersen       43.108904      13.314292         214.203165   \n",
-       "1     Adelie      Dream       63.275001      19.364104         158.413996   \n",
-       "2     Adelie      Dream       55.619788      16.143560         166.162871   \n",
-       "3     Adelie     Biscoe       50.953047      18.085707         239.855419   \n",
-       "4     Gentoo  Torgersen       35.460652      22.075665         210.642906   \n",
-       "\n",
-       "   body_mass_g     sex  \n",
-       "0  2258.408606  FEMALE  \n",
-       "1  4656.773158  FEMALE  \n",
-       "2  4703.175608  FEMALE  \n",
-       "3  5187.149507    MALE  \n",
-       "4  5630.456669    MALE  "
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "df_dummy = client.get_dummy_dataset()\n",
-    "print(df_dummy.shape)\n",
-    "df_dummy.head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4f85e950",
-   "metadata": {},
-   "source": [
-    "However, she would prefer to have a dataset with 200 rows and chooses a seed of 0, hence:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "01f4365a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "NB_ROWS = 200\n",
-    "SEED = 0"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "3f553b29",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(200, 7)\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>species</th>\n",
-       "      <th>island</th>\n",
-       "      <th>bill_length_mm</th>\n",
-       "      <th>bill_depth_mm</th>\n",
-       "      <th>flipper_length_mm</th>\n",
-       "      <th>body_mass_g</th>\n",
-       "      <th>sex</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>Gentoo</td>\n",
-       "      <td>Biscoe</td>\n",
-       "      <td>49.208473</td>\n",
-       "      <td>16.117959</td>\n",
-       "      <td>190.125950</td>\n",
-       "      <td>2873.291927</td>\n",
-       "      <td>FEMALE</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>Gentoo</td>\n",
-       "      <td>Torgersen</td>\n",
-       "      <td>55.031628</td>\n",
-       "      <td>19.963435</td>\n",
-       "      <td>242.929142</td>\n",
-       "      <td>3639.940005</td>\n",
-       "      <td>FEMALE</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>Chinstrap</td>\n",
-       "      <td>Torgersen</td>\n",
-       "      <td>51.096718</td>\n",
-       "      <td>16.777518</td>\n",
-       "      <td>159.961493</td>\n",
-       "      <td>5401.743330</td>\n",
-       "      <td>MALE</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>Adelie</td>\n",
-       "      <td>Biscoe</td>\n",
-       "      <td>49.070911</td>\n",
-       "      <td>14.796037</td>\n",
-       "      <td>244.530153</td>\n",
-       "      <td>2316.038092</td>\n",
-       "      <td>MALE</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>Chinstrap</td>\n",
-       "      <td>Biscoe</td>\n",
-       "      <td>44.827918</td>\n",
-       "      <td>13.246787</td>\n",
-       "      <td>236.948853</td>\n",
-       "      <td>5036.246870</td>\n",
-       "      <td>FEMALE</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "     species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
-       "0     Gentoo     Biscoe       49.208473      16.117959         190.125950   \n",
-       "1     Gentoo  Torgersen       55.031628      19.963435         242.929142   \n",
-       "2  Chinstrap  Torgersen       51.096718      16.777518         159.961493   \n",
-       "3     Adelie     Biscoe       49.070911      14.796037         244.530153   \n",
-       "4  Chinstrap     Biscoe       44.827918      13.246787         236.948853   \n",
-       "\n",
-       "   body_mass_g     sex  \n",
-       "0  2873.291927  FEMALE  \n",
-       "1  3639.940005  FEMALE  \n",
-       "2  5401.743330    MALE  \n",
-       "3  2316.038092    MALE  \n",
-       "4  5036.246870  FEMALE  "
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "df_dummy = client.get_dummy_dataset(nb_rows = NB_ROWS, seed = SEED)\n",
-    "print(df_dummy.shape)\n",
-    "df_dummy.head()"
+    "## Step 4: Average bill length with Smartnoise-SQL"
    ]
   },
   {
@@ -466,51 +181,12 @@
     "\n",
     "Now that she has an idea of what the data looks like, she wants to start querying the real dataset to for her research. However, before this other tools are at her disposal to reduce potential error risks and avoid spending budget on irrelevant queries. Of course, this does not have any impact on the budget.\n",
     "\n",
-    "It is possible to specify the flag `dummy=True` in the various queries to perform the query on the dummy dataset instead of the real dataset and ensure that the queries are doing what is expected of them. \n",
-    "\n",
-    "Therefore Dr. Antartica computes the results that she gets on the dummy dataframe that she created locally and on the same dummy dataframe in the server via a query and compare them to ensure that the query is well defined and works within the server.\n",
-    "\n",
-    "She tests with an example on the average bill length on the dataframe."
+    "It is possible to specify the flag `dummy=True` in the various queries to perform the query on the dummy dataset instead of the real dataset and ensure that the queries are doing what is expected of them. "
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 8,
-   "id": "b6caee55",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "47.51532"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# On the local dummy dataframe\n",
-    "result_local_dummy = round(df_dummy['bill_length_mm'].mean(), 5)\n",
-    "result_local_dummy"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c3a37d8d",
-   "metadata": {},
-   "source": [
-    "As the query on the server goes through the same workflow for dummies and real data, she still has to set values for theoratical budget to spend on the dummy query. Of course, this theoretical budget will NOT affect her real budget as this is on dummy data. \n",
-    "\n",
-    "It is recommended to use very high values on the budget parameters here to have little noise and small difference between the exact local result and the 'little noisy' server result. \n",
-    "\n",
-    "Also, make sure to use the same values of number of rows and seed to have the same dummy datasets."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
    "id": "3946425d",
    "metadata": {},
    "outputs": [],
@@ -521,255 +197,126 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
-   "id": "90cf2a6d",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "47.51229381350249"
-      ]
-     },
-     "execution_count": 10,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# On the remote server dummy dataframe\n",
-    "res = client.smartnoise_sql_query(\n",
-    "    query = QUERY,  \n",
-    "    epsilon = 100.0, # make sure to select high values of epsilon and delta to have small differences\n",
-    "    delta = 2.0,    # make sure to select high values of epsilon and delta to have small differences\n",
-    "    dummy = True, \n",
-    "    nb_rows = NB_ROWS,\n",
-    "    seed = SEED\n",
-    ")\n",
-    "res_server_dummy = res['query_response'][\"avg_bill_length_mm\"][0]\n",
-    "res_server_dummy"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bb3fa8eb",
+   "execution_count": 14,
+   "id": "99494f15-727d-4d03-a099-5cfe5a0c8a27",
    "metadata": {},
+   "outputs": [],
    "source": [
-    "She then checks that the responses on the dummy locally and the dummy on the server are close enough (difference would be only due to small noise addition)."
+    "EPSILON = 1.0\n",
+    "DELTA = 0.00001"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
-   "id": "0f2fff82",
+   "execution_count": 15,
+   "id": "90cf2a6d",
    "metadata": {},
    "outputs": [],
    "source": [
-    "np.testing.assert_almost_equal(\n",
-    "    result_local_dummy, \n",
-    "    res_server_dummy,\n",
-    "    decimal=2, \n",
-    "    err_msg=\"Responses are different, either try with a bigger budget or query is not doing what is intended.\"\n",
+    "# On the remote server dummy dataframe\n",
+    "dummy_res = client.smartnoise_sql_query(\n",
+    "    query = QUERY,  \n",
+    "    epsilon = EPSILON,\n",
+    "    delta = DELTA,\n",
+    "    dummy = True,\n",
     ")"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "5a82abcd",
-   "metadata": {},
-   "source": [
-    "As you can see res_local and res_server are close. We can accept that the small difference is due to the small noise added due to the large values of $\\epsilon$ and $\\delta$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "324454ed",
-   "metadata": {},
-   "source": [
-    "### Get current budget\n",
-    "\n",
-    "It is the first time that Dr. Antartica connects to the server and she wants to know how much buget the queen assigned her.\n",
-    "Therefore, she calls the fonction `get_initial_budget`."
-   ]
-  },
   {
    "cell_type": "code",
-   "execution_count": 12,
-   "id": "61a467f3",
+   "execution_count": 16,
+   "id": "f3a736f7-be77-4214-8f77-6abc7db34793",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "{'initial_epsilon': 10.0, 'initial_delta': 0.005}"
+       "'Average bill length on dummy: 46.84mm.'"
       ]
      },
-     "execution_count": 12,
+     "execution_count": 16,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "client.get_initial_budget()"
+    "avg_bl_dummy = np.round(dummy_res['query_response'][\"avg_bill_length_mm\"][0], 2)\n",
+    "f\"Average bill length on dummy: {avg_bl_dummy}mm.\""
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bc8f7a74",
+   "id": "b746374c",
    "metadata": {},
    "source": [
-    "She sees that she has 10.0 epsilon and 0.0004 epsilon at her disposal.\n",
-    "\n",
-    "Then she checks her total spent budget `get_total_spent_budget`. As she only did queries on metadata on dummy dataframes, this should still be 0."
+    "### Estimate cost of a query\n",
+    "Dr. Antartica checks the budget that computing the average bill length will really cost her if she asks the query with an `epsilon` and a `delta`."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
-   "id": "afd22f84",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'total_spent_epsilon': 2.714285714286655, 'total_spent_delta': 0.0}"
-      ]
-     },
-     "execution_count": 13,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "client.get_total_spent_budget()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "05daf5a4",
+   "execution_count": 20,
+   "id": "133020c6",
    "metadata": {},
+   "outputs": [],
    "source": [
-    "It will also be useful to know what the remaining budget is. Therefore, she calls the function `get_remaining_budget`. It just substarcts the total spent budget from the initial budget."
+    "cost = client.estimate_smartnoise_sql_cost(\n",
+    "    query = QUERY, \n",
+    "    epsilon = EPSILON, \n",
+    "    delta = DELTA\n",
+    ")"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
-   "id": "6260cf54",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'remaining_epsilon': 7.285714285713345, 'remaining_delta': 0.005}"
-      ]
-     },
-     "execution_count": 14,
-     "metadata": {},
-     "output_type": "execute_result"
+   "execution_count": 21,
+   "id": "ff19802d-cb39-48ee-9874-340a4bf2cc31",
+   "metadata": {
+    "jupyter": {
+     "source_hidden": true
     }
-   ],
-   "source": [
-    "client.get_remaining_budget()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "20298e00",
-   "metadata": {},
-   "source": [
-    "As expected, for now the remaining budget is equal to the inital budget."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b746374c",
-   "metadata": {},
-   "source": [
-    "### Estimate cost of a query\n",
-    "Another safeguard is the functionnality to estimate the cost of a query. As in OpenDP and SmartnoiseSQL, the budget that will by used by a query might be slightly different than what is asked by the user. The `estimate cost` function returns the estimated real cost of any query.\n",
-    "\n",
-    "Again, of course, this will not impact the user's budget.\n",
-    "\n",
-    "Dr. Antartica checks the budget that computing the average bill length will really cost her if she asks the query with an `epsilon` and a `delta`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "id": "133020c6",
-   "metadata": {},
+   },
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "{'epsilon_cost': 2.0, 'delta_cost': 4.999999999999449e-05}"
+       "'This query would actually cost her 2.0 epsilon and 5.000000000032756e-06 delta.'"
       ]
      },
-     "execution_count": 15,
+     "execution_count": 21,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "client.estimate_smartnoise_sql_cost(\n",
-    "    query = QUERY, \n",
-    "    epsilon = 1.0, \n",
-    "    delta = 1e-4\n",
-    ")"
+    "f'This query would actually cost her {cost[\"epsilon_cost\"]} epsilon and {cost[\"delta_cost\"]} delta.'"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "71580822",
+   "id": "4aeba2ee-0512-46d8-a8ad-a02ee99cd3a3",
    "metadata": {},
    "source": [
-    "So this query would actually cost her 3.0 epsilon and a little 1.499e-4 delta. As she does not want to spend to much budget here she tries other values of budget."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "id": "df487c62",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'epsilon_cost': 0.4, 'delta_cost': 5.000000000032756e-06}"
-      ]
-     },
-     "execution_count": 16,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "client.estimate_smartnoise_sql_cost(\n",
-    "    query = QUERY, \n",
-    "    epsilon = 0.2, \n",
-    "    delta = 1e-5\n",
-    ")"
+    "### (Advanced) Improve budget spent by rewriting query"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3c6a3a8c",
+   "id": "c255d210-7ba1-4152-8a30-97c7289dd361",
    "metadata": {},
    "source": [
-    "This query would actually cost her 0.6 epsilon and a similar delta. She decides that it is good enough."
+    "This is actually the double than what she put in input. There are ways to avoid this by understanding the underlying functioning of Smartnoise-SQL library.\n",
+    "\n",
+    "In the background, Smartnoise-SQL decomposes the DP query in multiple other queries and the budget given in input is spent on each of these sub-queries. Here for the average, we need a sum divided by a count, hence `EPSILON` is spent once for the sum and then once more for the count."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
-   "id": "c9c8d3e7",
+   "execution_count": null,
+   "id": "6e79e35d-9bda-4585-9802-ac260a450a17",
    "metadata": {},
    "outputs": [],
-   "source": [
-    "EPSILON = 0.2\n",
-    "DELTA = 1e-5"
-   ]
+   "source": []
   },
   {
    "cell_type": "markdown",

From 6965c1d057592b3a6b759a156220924a70fd4ec3 Mon Sep 17 00:00:00 2001
From: PaulineMauryL <pauline.maury-laribiere@bfs.admin.ch>
Date: Thu, 5 Sep 2024 06:44:43 +0000
Subject: [PATCH 02/12] examples for cost

---
 .../Demo_Client_Notebook_Smartnoise-SQL.ipynb | 594 +++++++-----------
 1 file changed, 235 insertions(+), 359 deletions(-)

diff --git a/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb b/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
index 43159ef6..a5d9ae94 100644
--- a/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
+++ b/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
@@ -164,6 +164,47 @@
     "metadata"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": 48,
+   "id": "ba329ffc-3eaa-4fdd-b526-c1b59c71ed3f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Number of penguins: 344.\n"
+     ]
+    }
+   ],
+   "source": [
+    "nb_penguin = metadata['rows']\n",
+    "print(f\"Number of penguins: {nb_penguin}.\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "id": "90e3edb2-54b1-476f-b362-a83e20084a74",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "dict_keys(['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex'])"
+      ]
+     },
+     "execution_count": 50,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "columns = metadata[\"columns\"].keys()\n",
+    "columns"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "5bf5b471-1495-4046-bec1-ddf96c98642f",
@@ -203,7 +244,7 @@
    "outputs": [],
    "source": [
     "EPSILON = 1.0\n",
-    "DELTA = 0.00001"
+    "DELTA = 1e-5"
    ]
   },
   {
@@ -271,11 +312,7 @@
    "cell_type": "code",
    "execution_count": 21,
    "id": "ff19802d-cb39-48ee-9874-340a4bf2cc31",
-   "metadata": {
-    "jupyter": {
-     "source_hidden": true
-    }
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -292,67 +329,28 @@
     "f'This query would actually cost her {cost[\"epsilon_cost\"]} epsilon and {cost[\"delta_cost\"]} delta.'"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "4aeba2ee-0512-46d8-a8ad-a02ee99cd3a3",
-   "metadata": {},
-   "source": [
-    "### (Advanced) Improve budget spent by rewriting query"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "c255d210-7ba1-4152-8a30-97c7289dd361",
    "metadata": {},
    "source": [
-    "This is actually the double than what she put in input. There are ways to avoid this by understanding the underlying functioning of Smartnoise-SQL library.\n",
-    "\n",
-    "In the background, Smartnoise-SQL decomposes the DP query in multiple other queries and the budget given in input is spent on each of these sub-queries. Here for the average, we need a sum divided by a count, hence `EPSILON` is spent once for the sum and then once more for the count."
+    "This is actually the double than what she put in input. In the background, Smartnoise-SQL decomposes the DP query in multiple other queries and the budget given in input is spent on each of these sub-queries. Here for the average, we need a sum divided by a count, hence `EPSILON` is spent once for the sum and then once more for the count. (see NOTE below for tips and explanation)."
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "6e79e35d-9bda-4585-9802-ac260a450a17",
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
   {
    "cell_type": "markdown",
    "id": "e5379edf",
    "metadata": {},
    "source": [
     "### Query real dataset\n",
-    "Now that all the safeguard functions were tested, Dr. Antartica is ready to query on the real dataset and get a differentially private response of the average bill length. By default, the flag `dummy` is False so setting it is optional. She uses the values of `epsilon` and `delta` that she selected just before.\n",
+    "Dr. Antartica is ready to query on the real dataset and get a differentially private response of the average bill length. By default, the flag `dummy` is False so setting it is optional. She uses the values of `epsilon` and `delta` that she selected just before.\n",
     "\n",
     "Careful: This command DOES spend the budget of the user and the remaining budget is updated for every query."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
-   "id": "19e60263",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'remaining_epsilon': 7.285714285713345, 'remaining_delta': 0.005}"
-      ]
-     },
-     "execution_count": 18,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "client.get_remaining_budget()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": 28,
    "id": "69767fac",
    "metadata": {},
    "outputs": [],
@@ -367,7 +365,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": 30,
    "id": "6dbbdf93",
    "metadata": {},
    "outputs": [
@@ -375,13 +373,13 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Average bill length: 44.18mm.\n"
+      "Average bill length on private data: 44.11mm.\n"
      ]
     }
    ],
    "source": [
     "avg_bill_length = avg_bill_length_response['query_response']['avg_bill_length_mm'].iloc[0]\n",
-    "print(f\"Average bill length: {np.round(avg_bill_length, 2)}mm.\")"
+    "print(f\"Average bill length on private data: {np.round(avg_bill_length, 2)}mm.\")"
    ]
   },
   {
@@ -392,66 +390,6 @@
     "After each query on the real dataset, the budget informations are also returned to the researcher. It is possible possible to check the remaining budget again afterwards:"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "id": "39701fe5",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'remaining_epsilon': 6.885714285713345,\n",
-       " 'remaining_delta': 0.004994999999999967}"
-      ]
-     },
-     "execution_count": 21,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "client.get_remaining_budget()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e37c587f",
-   "metadata": {},
-   "source": [
-    "As can be seen in `get_total_spent_budget()`, it is the budget estimated with `estimate_cost()` that was spent."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "id": "487f835f",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'total_spent_epsilon': 3.114285714286655,\n",
-       " 'total_spent_delta': 5.000000000032756e-06}"
-      ]
-     },
-     "execution_count": 22,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "client.get_total_spent_budget()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "eef4afcd",
-   "metadata": {},
-   "source": [
-    "Dr. Antartica has now a differentially private estimation of the bill length of all birds and is confident to use the library for the rest of her analyses."
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "04929993",
@@ -473,17 +411,17 @@
    "id": "9d41bd58",
    "metadata": {},
    "source": [
-    "She is first interested to have a better idea of the distribution of flipper length of all species. She already has the mean from step 3, so she just need to compute the standard deviation and know the number of penguins in the dataset. As it is just an exploration step, she uses very little budget values."
+    "She is first interested to have a better idea of the distribution of bill length of all species. She already has the number of penguins (=number of rows as `max_ids=1`) from the metadata and the average bill length from step 3, so she just need to compute the standard deviation. As it is just an exploration step, she uses very little budget values."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 39,
    "id": "04b376ef",
    "metadata": {},
    "outputs": [],
    "source": [
-    "QUERY = \"SELECT COUNT(bill_length_mm) AS nb_penguin, STD(bill_length_mm) AS std_bill_length_mm FROM df\""
+    "QUERY = \"SELECT STD(bill_length_mm) AS std_bill_length_mm FROM df\""
    ]
   },
   {
@@ -496,63 +434,39 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 24,
+   "execution_count": 32,
    "id": "5aa9c304",
    "metadata": {},
+   "outputs": [],
+   "source": [
+    "dummy_res = client.smartnoise_sql_query(\n",
+    "    query = QUERY, \n",
+    "    epsilon = 1.0, \n",
+    "    delta = 1e-5, \n",
+    "    dummy = True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "49e4ba47-adf3-471b-a35b-c44346ed12a8",
+   "metadata": {},
    "outputs": [
     {
      "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>nb_penguin</th>\n",
-       "      <th>std_bill_length_mm</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>100</td>\n",
-       "      <td>10.332225</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
       "text/plain": [
-       "   nb_penguin  std_bill_length_mm\n",
-       "0         100           10.332225"
+       "'The dummy standard variation is 2.78.'"
       ]
      },
-     "execution_count": 24,
+     "execution_count": 38,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "dummy_res = client.smartnoise_sql_query(\n",
-    "    query = QUERY, \n",
-    "    epsilon = 100.0, \n",
-    "    delta = 10.0, \n",
-    "    dummy = True\n",
-    ")\n",
-    "dummy_res['query_response']"
+    "dummy_std = np.round(dummy_res['query_response']['std_bill_length_mm'].iloc[0], 2)\n",
+    "f\"The dummy standard variation is {dummy_std}.\""
    ]
   },
   {
@@ -565,130 +479,64 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 25,
+   "execution_count": 41,
    "id": "a8fa2c49",
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'epsilon_cost': 1.5, 'delta_cost': 5.000000000032756e-06}"
-      ]
-     },
-     "execution_count": 25,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
-    "client.estimate_smartnoise_sql_cost(\n",
+    "cost = client.estimate_smartnoise_sql_cost(\n",
     "    query = QUERY, \n",
-    "    epsilon = 0.5, \n",
+    "    epsilon = 1.0, \n",
     "    delta = 1e-5\n",
     ")"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "bed840d3",
-   "metadata": {},
-   "source": [
-    "It is a bit too much, she decides to test for less:"
-   ]
-  },
   {
    "cell_type": "code",
-   "execution_count": 26,
-   "id": "edc97e73",
+   "execution_count": 42,
+   "id": "b3aa05ca-3243-4415-a8ec-fb5ad47d244d",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "{'epsilon_cost': 0.75, 'delta_cost': 5.000000000032756e-06}"
+       "'This query would actually cost her 3.0 epsilon and 5.000000000032756e-06 delta.'"
       ]
      },
-     "execution_count": 26,
+     "execution_count": 42,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "client.estimate_smartnoise_sql_cost(\n",
-    "    query = QUERY, \n",
-    "    epsilon = 0.25, \n",
-    "    delta = 1e-5\n",
-    ")"
+    "f'This query would actually cost her {cost[\"epsilon_cost\"]} epsilon and {cost[\"delta_cost\"]} delta.'"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "da9f81c4",
+   "id": "884f0337-a960-460e-8797-84ddd77974a3",
    "metadata": {},
    "source": [
-    "That's fine, she is ready to query:"
+    "This times it is three times the budget as the standard deviation needs the average, then a difference and a count again. "
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": 43,
    "id": "534979fb",
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>nb_penguin</th>\n",
-       "      <th>std_bill_length_mm</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>343</td>\n",
-       "      <td>13.064982</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "   nb_penguin  std_bill_length_mm\n",
-       "0         343           13.064982"
-      ]
-     },
-     "execution_count": 27,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
-    "response = client.smartnoise_sql_query(query = QUERY, epsilon = 0.25, delta = 1e-5)\n",
-    "response = response['query_response']\n",
-    "response"
+    "response = client.smartnoise_sql_query(\n",
+    "    query = QUERY,\n",
+    "    epsilon = 1.0,\n",
+    "    delta = 1e-5\n",
+    ")"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": 51,
    "id": "674332e7",
    "metadata": {},
    "outputs": [
@@ -696,15 +544,11 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Number of penguins: 343.\n",
-      "Standard deviation of bill length: 13.06.\n"
+      "Standard deviation of bill length: 6.89.\n"
      ]
     }
    ],
    "source": [
-    "nb_penguin = response['nb_penguin'].iloc[0]\n",
-    "print(f\"Number of penguins: {nb_penguin}.\")\n",
-    "\n",
     "std_bill_length = response['std_bill_length_mm'].iloc[0]\n",
     "print(f\"Standard deviation of bill length: {np.round(std_bill_length, 2)}.\")"
    ]
@@ -719,7 +563,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 29,
+   "execution_count": 52,
    "id": "f72b19d0",
    "metadata": {},
    "outputs": [
@@ -727,7 +571,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Standard error of bill length: 0.71.\n"
+      "Standard error of bill length: 0.37.\n"
      ]
     }
    ],
@@ -739,7 +583,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 53,
    "id": "62630a03",
    "metadata": {},
    "outputs": [
@@ -747,7 +591,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "The 95% confidence interval of the bill length of all penguins is [42.8, 45.57].\n"
+      "The 95% confidence interval of the bill length of all penguins is [43.38, 44.84].\n"
      ]
     }
    ],
@@ -758,6 +602,117 @@
     "print(f\"The 95% confidence interval of the bill length of all penguins is [{np.round(lower_bound, 2)}, {np.round(upper_bound, 2)}].\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "26d04824-ff41-4d25-8a4e-1506a416dd0b",
+   "metadata": {},
+   "source": [
+    "## Note on budget with Smartnoise-SQL"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "id": "611df7d2-86eb-4710-a6eb-a3de214ece37",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "epsilon = 1.0\n",
+    "delta = 1e-5"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 57,
+   "id": "32b76d26-edce-4cf9-bab9-bf1ea936d288",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'epsilon_cost': 3.0, 'delta_cost': 5.000000000032756e-06}"
+      ]
+     },
+     "execution_count": 57,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "QUERY = \"SELECT STD(bill_length_mm) AS std_bill_length_mm FROM df\"\n",
+    "cost = client.estimate_smartnoise_sql_cost(query = QUERY, epsilon = epsilon, delta = delta)\n",
+    "cost"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 62,
+   "id": "f84411ed-dab5-4acc-ab49-bfec9ebc3530",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'epsilon_cost': 3.0, 'delta_cost': 5.000000000032756e-06}"
+      ]
+     },
+     "execution_count": 62,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "QUERY = \"SELECT AVG(bill_length_mm) AS avg_bl, STD(bill_length_mm) as std_bl FROM df\"\n",
+    "cost = client.estimate_smartnoise_sql_cost(query = QUERY, epsilon = epsilon, delta = delta)\n",
+    "cost"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 61,
+   "id": "2454db71-4074-46dd-a863-c690c0160c51",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'epsilon_cost': 3.0, 'delta_cost': 5.000000000032756e-06}"
+      ]
+     },
+     "execution_count": 61,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "QUERY = \"SELECT COUNT(bill_length_mm) AS count_bl, AVG(bill_length_mm) AS avg_bl, STD(bill_length_mm) as std_bl FROM df\"\n",
+    "cost = client.estimate_smartnoise_sql_cost(query = QUERY, epsilon = epsilon, delta = delta)\n",
+    "cost"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 72,
+   "id": "5b69f3f2-07dd-48b8-9cd5-64eee53331f7",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'epsilon_cost': 3.0, 'delta_cost': 5.000000000032756e-06}"
+      ]
+     },
+     "execution_count": 72,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "QUERY = \"SELECT COUNT(bill_length_mm) AS count_bl, AVG(bill_length_mm) AS avg_bl, STD(bill_length_mm) as std_bl FROM df GROUP BY species\"\n",
+    "cost = client.estimate_smartnoise_sql_cost(query = QUERY, epsilon = epsilon, delta = delta)\n",
+    "cost"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "b5ee7ad2",
@@ -783,7 +738,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": 63,
    "id": "7d9ae766-4c0d-4dc5-9c9a-5f7eb99718f9",
    "metadata": {},
    "outputs": [],
@@ -793,7 +748,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 32,
+   "execution_count": 64,
    "id": "5006201d",
    "metadata": {},
    "outputs": [],
@@ -803,7 +758,7 @@
     "        COUNT(bill_length_mm) AS nb_penguin,  \\\n",
     "        AVG(bill_length_mm) AS avg_bill_length_mm, \\\n",
     "        STD(bill_length_mm) AS std_bill_length_mm \\\n",
-    "        FROM df  GROUP BY species\""
+    "        FROM df GROUP BY species\""
    ]
   },
   {
@@ -816,7 +771,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": 65,
    "id": "0255550b-7fd2-4244-a8eb-da809ddc6a5b",
    "metadata": {},
    "outputs": [
@@ -826,7 +781,7 @@
        "{'epsilon_cost': 3.0, 'delta_cost': 4.999999999999449e-05}"
       ]
      },
-     "execution_count": 33,
+     "execution_count": 65,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -845,22 +800,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": 66,
    "id": "80d9933b",
    "metadata": {},
    "outputs": [
     {
-     "data": {
-      "text/plain": [
-       "{'query_response':      species  nb_penguin  avg_bill_length_mm  std_bill_length_mm\n",
-       " 0     Adelie          39           45.659944           10.695675\n",
-       " 1  Chinstrap          33           45.690454           14.067739\n",
-       " 2     Gentoo          31           38.472887           14.542186}"
-      ]
-     },
-     "execution_count": 34,
-     "metadata": {},
-     "output_type": "execute_result"
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Server error status 400: {\"InvalidQueryException\":\"SQL Reader generated NAN results. Epsilon: 1.0 and Delta: 1.0 are too small to generate output.\"}\n"
+     ]
     }
    ],
    "source": [
@@ -878,58 +827,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 35,
-   "id": "6b014db4-acbd-4ae1-a3b6-397035851583",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'remaining_epsilon': 6.135714285713345,\n",
-       " 'remaining_delta': 0.004989999999999935}"
-      ]
-     },
-     "execution_count": 35,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "client.get_remaining_budget()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "43d3488d-3987-4fec-a840-78385e956832",
-   "metadata": {},
-   "source": [
-    "The maximum she can do with all her remaining budget of 7.4 is around 7.4/4 = 1.85. Let's check:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 42,
-   "id": "99d7998d-daa1-4d5e-aa42-abc5aabdf2e3",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'epsilon_cost': 5.550000000000001, 'delta_cost': 4.999999999999449e-05}"
-      ]
-     },
-     "execution_count": 42,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "client.estimate_smartnoise_sql_cost(query = QUERY, epsilon = 7.4/4, delta = 1e-4)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 43,
+   "execution_count": 70,
    "id": "0e07fde9-9430-4a12-8337-0503ac162c26",
    "metadata": {},
    "outputs": [
@@ -937,18 +835,18 @@
      "data": {
       "text/plain": [
        "{'query_response':      species  nb_penguin  avg_bill_length_mm  std_bill_length_mm\n",
-       " 0     Adelie          37           48.755816            3.634415\n",
-       " 1  Chinstrap          33           46.912863            4.552931\n",
-       " 2     Gentoo          29           41.803438           17.566451}"
+       " 0     Adelie          38           46.293151           13.104888\n",
+       " 1  Chinstrap          33           48.744280           14.518132\n",
+       " 2     Gentoo          28           43.582573           11.503930}"
       ]
      },
-     "execution_count": 43,
+     "execution_count": 70,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "dummy_res = client.smartnoise_sql_query(query = QUERY, epsilon = 7.4/4, delta = 1e-4, dummy = True)\n",
+    "dummy_res = client.smartnoise_sql_query(query = QUERY, epsilon = 2.0, delta = 1e-4, dummy = True)\n",
     "dummy_res"
    ]
   },
@@ -962,12 +860,20 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 44,
+   "execution_count": 71,
    "id": "59f2d665",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Server error status 400: {\"InvalidQueryException\":\"Not enough budget for this query epsilon remaining 5.0, delta remaining 0.004989999999999935.\"}\n"
+     ]
+    }
+   ],
    "source": [
-    "flipper_length_response = client.smartnoise_sql_query(query = QUERY, epsilon = 7.4/4, delta = 1e-4)"
+    "flipper_length_response = client.smartnoise_sql_query(query = QUERY, epsilon = 2.0, delta = 1e-4)"
    ]
   },
   {
@@ -978,28 +884,6 @@
     "And now she should not have any remaining budget:"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 45,
-   "id": "6eb20cfb-fa53-496f-940d-9b17b05fa074",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'remaining_epsilon': 0.5857142857133439,\n",
-       " 'remaining_delta': 0.00493999999999994}"
-      ]
-     },
-     "execution_count": 45,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "client.get_remaining_budget()"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "cb96f406-d409-4531-ac86-05f1c9296705",
@@ -1254,14 +1138,6 @@
     "df_flipper"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "f79e8333-1f06-4019-af3c-94ff2362d036",
-   "metadata": {},
-   "source": [
-    "She can now go and present her findings to queen Icebergina."
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,

From 18990c94f2dd3d4c51944222f09b39b8ba7a9953 Mon Sep 17 00:00:00 2001
From: PaulineMauryL <pauline.maury-laribiere@bfs.admin.ch>
Date: Tue, 10 Sep 2024 11:29:50 +0000
Subject: [PATCH 03/12] add explanation sor subqueries and add explanation for
 overriding mechanism

---
 .../Demo_Client_Notebook_Smartnoise-SQL.ipynb | 492 +++++++++++++-----
 1 file changed, 372 insertions(+), 120 deletions(-)

diff --git a/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb b/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
index a5d9ae94..54f15a60 100644
--- a/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
+++ b/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
@@ -41,7 +41,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
    "id": "6f5d749c-0f39-4f78-8157-528bc39764b2",
    "metadata": {},
    "outputs": [],
@@ -54,12 +54,12 @@
    "id": "53cf3204-18a8-423c-9de2-c2966fdf84fb",
    "metadata": {},
    "source": [
-    "Or using a local version of the clien"
+    "Or using a local version of the client"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 2,
    "id": "98b4013c-ea93-4e4d-8885-15aac0039c12",
    "metadata": {},
    "outputs": [],
@@ -71,7 +71,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 3,
    "id": "9d96dcd7",
    "metadata": {},
    "outputs": [],
@@ -99,7 +99,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 4,
    "id": "941991f7",
    "metadata": {},
    "outputs": [],
@@ -128,7 +128,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 170,
    "id": "d15cbe39",
    "metadata": {},
    "outputs": [
@@ -154,7 +154,7 @@
        " 'rows': 344}"
       ]
      },
-     "execution_count": 6,
+     "execution_count": 170,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -166,7 +166,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 48,
+   "execution_count": 6,
    "id": "ba329ffc-3eaa-4fdd-b526-c1b59c71ed3f",
    "metadata": {},
    "outputs": [
@@ -185,7 +185,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 50,
+   "execution_count": 7,
    "id": "90e3edb2-54b1-476f-b362-a83e20084a74",
    "metadata": {},
    "outputs": [
@@ -195,7 +195,7 @@
        "dict_keys(['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex'])"
       ]
      },
-     "execution_count": 50,
+     "execution_count": 7,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -227,7 +227,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 239,
    "id": "3946425d",
    "metadata": {},
    "outputs": [],
@@ -238,18 +238,18 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 240,
    "id": "99494f15-727d-4d03-a099-5cfe5a0c8a27",
    "metadata": {},
    "outputs": [],
    "source": [
-    "EPSILON = 1.0\n",
+    "EPSILON = 0.5\n",
     "DELTA = 1e-5"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 241,
    "id": "90cf2a6d",
    "metadata": {},
    "outputs": [],
@@ -265,17 +265,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 242,
    "id": "f3a736f7-be77-4214-8f77-6abc7db34793",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "'Average bill length on dummy: 46.84mm.'"
+       "'Average bill length on dummy: 46.06mm.'"
       ]
      },
-     "execution_count": 16,
+     "execution_count": 242,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -296,7 +296,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": 243,
    "id": "133020c6",
    "metadata": {},
    "outputs": [],
@@ -304,23 +304,23 @@
     "cost = client.estimate_smartnoise_sql_cost(\n",
     "    query = QUERY, \n",
     "    epsilon = EPSILON, \n",
-    "    delta = DELTA\n",
+    "    delta = DELTA,\n",
     ")"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 21,
+   "execution_count": 244,
    "id": "ff19802d-cb39-48ee-9874-340a4bf2cc31",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "'This query would actually cost her 2.0 epsilon and 5.000000000032756e-06 delta.'"
+       "'This query would actually cost her 1.0 epsilon and 5.000000000032756e-06 delta.'"
       ]
      },
-     "execution_count": 21,
+     "execution_count": 244,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -337,6 +337,88 @@
     "This is actually the double than what she put in input. In the background, Smartnoise-SQL decomposes the DP query in multiple other queries and the budget given in input is spent on each of these sub-queries. Here for the average, we need a sum divided by a count, hence `EPSILON` is spent once for the sum and then once more for the count. (see NOTE below for tips and explanation)."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "4ec31515-39fe-426b-8339-fc2ac9c1e09e",
+   "metadata": {},
+   "source": [
+    "### Overide DP mechanism"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24b060d0-3c6f-4d35-824f-347ec5103723",
+   "metadata": {},
+   "source": [
+    "She wants to use another DP-mechanism for this query. She can change it via the `mechanism` argument. See Smartnoise-SQL documentation [here for overriding mechanisms](https://docs.smartnoise.org/sql/advanced.html#overriding-mechanisms)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 248,
+   "id": "1f726ce8-2e3d-462a-bbd8-598198935bc9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# On the remote server dummy dataframe\n",
+    "dummy_res = client.smartnoise_sql_query(\n",
+    "    query = QUERY,  \n",
+    "    epsilon = EPSILON,\n",
+    "    delta = DELTA,\n",
+    "    mechanisms = {\"count\": \"gaussian\", \"sum_float\": \"laplace\"},\n",
+    "    dummy = True,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 249,
+   "id": "46e064f0-f1e2-49af-8f14-fde44f981813",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Average bill length on dummy: 48.36mm.'"
+      ]
+     },
+     "execution_count": 249,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "avg_bl_dummy = np.round(dummy_res['query_response'][\"avg_bill_length_mm\"][0], 2)\n",
+    "f\"Average bill length on dummy: {avg_bl_dummy}mm.\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 250,
+   "id": "7e20014d-ad82-4a2d-88d9-ec981150e7db",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'epsilon_cost': 1.0, 'delta_cost': 1.4999949999983109e-05}"
+      ]
+     },
+     "execution_count": 250,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "cost = client.estimate_smartnoise_sql_cost(\n",
+    "    query = QUERY, \n",
+    "    epsilon = EPSILON, \n",
+    "    delta = DELTA,\n",
+    "    mechanisms = {\"count\": \"gaussian\", \"sum_float\": \"laplace\"}\n",
+    ")\n",
+    "cost"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "e5379edf",
@@ -350,7 +432,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": 14,
    "id": "69767fac",
    "metadata": {},
    "outputs": [],
@@ -365,7 +447,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 15,
    "id": "6dbbdf93",
    "metadata": {},
    "outputs": [
@@ -373,7 +455,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Average bill length on private data: 44.11mm.\n"
+      "Average bill length on private data: 43.3mm.\n"
      ]
     }
    ],
@@ -416,7 +498,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 39,
+   "execution_count": 16,
    "id": "04b376ef",
    "metadata": {},
    "outputs": [],
@@ -434,14 +516,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 32,
+   "execution_count": 17,
    "id": "5aa9c304",
    "metadata": {},
    "outputs": [],
    "source": [
     "dummy_res = client.smartnoise_sql_query(\n",
     "    query = QUERY, \n",
-    "    epsilon = 1.0, \n",
+    "    epsilon = 0.5, \n",
     "    delta = 1e-5, \n",
     "    dummy = True\n",
     ")"
@@ -449,17 +531,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 38,
+   "execution_count": 18,
    "id": "49e4ba47-adf3-471b-a35b-c44346ed12a8",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "'The dummy standard variation is 2.78.'"
+       "'The dummy standard variation is 8.53.'"
       ]
      },
-     "execution_count": 38,
+     "execution_count": 18,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -479,31 +561,31 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 41,
+   "execution_count": 19,
    "id": "a8fa2c49",
    "metadata": {},
    "outputs": [],
    "source": [
     "cost = client.estimate_smartnoise_sql_cost(\n",
     "    query = QUERY, \n",
-    "    epsilon = 1.0, \n",
+    "    epsilon = 0.5, \n",
     "    delta = 1e-5\n",
     ")"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 42,
+   "execution_count": 20,
    "id": "b3aa05ca-3243-4415-a8ec-fb5ad47d244d",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "'This query would actually cost her 3.0 epsilon and 5.000000000032756e-06 delta.'"
+       "'This query would actually cost her 1.5 epsilon and 5.000000000032756e-06 delta.'"
       ]
      },
-     "execution_count": 42,
+     "execution_count": 20,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -522,21 +604,21 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 43,
+   "execution_count": 21,
    "id": "534979fb",
    "metadata": {},
    "outputs": [],
    "source": [
     "response = client.smartnoise_sql_query(\n",
     "    query = QUERY,\n",
-    "    epsilon = 1.0,\n",
+    "    epsilon = 0.5,\n",
     "    delta = 1e-5\n",
     ")"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 51,
+   "execution_count": 22,
    "id": "674332e7",
    "metadata": {},
    "outputs": [
@@ -544,12 +626,12 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Standard deviation of bill length: 6.89.\n"
+      "Standard deviation of bill length: 3.0.\n"
      ]
     }
    ],
    "source": [
-    "std_bill_length = response['std_bill_length_mm'].iloc[0]\n",
+    "std_bill_length = response['query_response']['std_bill_length_mm'].iloc[0]\n",
     "print(f\"Standard deviation of bill length: {np.round(std_bill_length, 2)}.\")"
    ]
   },
@@ -563,7 +645,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 52,
+   "execution_count": 23,
    "id": "f72b19d0",
    "metadata": {},
    "outputs": [
@@ -571,7 +653,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Standard error of bill length: 0.37.\n"
+      "Standard error of bill length: 0.16.\n"
      ]
     }
    ],
@@ -583,7 +665,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 53,
+   "execution_count": 24,
    "id": "62630a03",
    "metadata": {},
    "outputs": [
@@ -591,7 +673,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "The 95% confidence interval of the bill length of all penguins is [43.38, 44.84].\n"
+      "The 95% confidence interval of the bill length of all penguins is [42.98, 43.61].\n"
      ]
     }
    ],
@@ -607,12 +689,20 @@
    "id": "26d04824-ff41-4d25-8a4e-1506a416dd0b",
    "metadata": {},
    "source": [
-    "## Note on budget with Smartnoise-SQL"
+    "## Note on budget with Smartnoise-SQL (Advanced)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9aa0b56-bda3-405e-9f33-ae7135dbfeba",
+   "metadata": {},
+   "source": [
+    "All of these queries will cost the same budget in Smartnoise-SQL. The reason is that the smartnoise-sql translates the input query in sub queries, find the answer for each sub query for the budget in input and then assemble the results. For the first 'standard deviation' query, it requires a count, an average, and only then the computation for the standard deviation. Hence, to save budget it is better to make a general query directly and retrieve all the 'sub-answers'."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 55,
+   "execution_count": 25,
    "id": "611df7d2-86eb-4710-a6eb-a3de214ece37",
    "metadata": {},
    "outputs": [],
@@ -623,7 +713,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 57,
+   "execution_count": 26,
    "id": "32b76d26-edce-4cf9-bab9-bf1ea936d288",
    "metadata": {},
    "outputs": [
@@ -633,7 +723,7 @@
        "{'epsilon_cost': 3.0, 'delta_cost': 5.000000000032756e-06}"
       ]
      },
-     "execution_count": 57,
+     "execution_count": 26,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -646,7 +736,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 62,
+   "execution_count": 27,
    "id": "f84411ed-dab5-4acc-ab49-bfec9ebc3530",
    "metadata": {},
    "outputs": [
@@ -656,7 +746,7 @@
        "{'epsilon_cost': 3.0, 'delta_cost': 5.000000000032756e-06}"
       ]
      },
-     "execution_count": 62,
+     "execution_count": 27,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -669,7 +759,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 61,
+   "execution_count": 28,
    "id": "2454db71-4074-46dd-a863-c690c0160c51",
    "metadata": {},
    "outputs": [
@@ -679,7 +769,7 @@
        "{'epsilon_cost': 3.0, 'delta_cost': 5.000000000032756e-06}"
       ]
      },
-     "execution_count": 61,
+     "execution_count": 28,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -690,9 +780,142 @@
     "cost"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "73bd85ca-eed0-488f-807e-6f03f99898cb",
+   "metadata": {},
+   "source": [
+    "A way to know the sub queries of a query is to use the following Smartnoise-SQL code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 171,
+   "id": "5b51cf35-68db-4b11-acbe-8df15b826d10",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Convert metadata to Smartnoise-SQL compliant metadata\n",
+    "metadata = dict(metadata)\n",
+    "metadata.update(metadata[\"columns\"])\n",
+    "del metadata[\"columns\"]\n",
+    "snsql_metadata = {\"\": {\"\": {\"df\": metadata}}}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 172,
+   "id": "7ab04c8f-8d79-4871-bc16-1f0368fbd403",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Write the query to inspect\n",
+    "QUERY = \"SELECT STD(bill_length_mm) as std_bl FROM df\"\n",
+    "#QUERY = \"SELECT COUNT(*) as nb_row FROM df\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 173,
+   "id": "a78d7d86-ab95-4521-b84d-49ac795316c3",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<snsql._ast.ast.Query at 0x7b569ddbd6d0>"
+      ]
+     },
+     "execution_count": 173,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from snsql.sql.private_rewriter import Rewriter\n",
+    "rewriter = Rewriter(snsql_metadata)\n",
+    "rewriter.options.row_privacy = metadata[\"row_privacy\"]\n",
+    "rewriter.options.max_contrib = metadata[\"max_ids\"]\n",
+    "dp_query = rewriter.query(QUERY)\n",
+    "dp_query"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2df6bf8c-d06e-4b5c-9509-b2ba01fef581",
+   "metadata": {},
+   "source": [
+    "The original dp query is represented as one query:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 185,
+   "id": "251b773a-864c-4852-ae89-1472ac768975",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'std_bl': <snsql._ast.tokens.Symbol at 0x7b569ddbf090>}"
+      ]
+     },
+     "execution_count": 185,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dp_query._named_symbols"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5830777-95fc-432e-b4c5-6bd59aac514f",
+   "metadata": {},
+   "source": [
+    "But has 4 named symbols inside: 2 alias for the 2 SQL subqueries \n",
+    "- 'keycount' for 'count_bill_length_mm',\n",
+    "- 'sum_alias_0xxxx' for 'sum_bill_length_mm'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 213,
+   "id": "f4ac4261-e870-4f07-8264-9a2041a35abc",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'keycount': <snsql._ast.tokens.Symbol at 0x7b569de9c850>,\n",
+       " 'sum_alias_0xd0e4': <snsql._ast.tokens.Symbol at 0x7b569df1d690>,\n",
+       " 'count_bill_length_mm': <snsql._ast.tokens.Symbol at 0x7b569debbd10>,\n",
+       " 'sum_bill_length_mm': <snsql._ast.tokens.Symbol at 0x7b569dde6350>}"
+      ]
+     },
+     "execution_count": 213,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "subquery = dp_query.source.relations[0].primary.query\n",
+    "syms = subquery._named_symbols\n",
+    "syms"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc07d8c4-153f-4ad3-a977-a971b94d75aa",
+   "metadata": {},
+   "source": [
+    "This last query with `group_by` will cost the same because `max_ids=1` (at most a penguin is once in the dataset) and so the `group_by` is applied on different partitions of the population."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 72,
+   "execution_count": 124,
    "id": "5b69f3f2-07dd-48b8-9cd5-64eee53331f7",
    "metadata": {},
    "outputs": [
@@ -702,7 +925,7 @@
        "{'epsilon_cost': 3.0, 'delta_cost': 5.000000000032756e-06}"
       ]
      },
-     "execution_count": 72,
+     "execution_count": 124,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -713,6 +936,14 @@
     "cost"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "e20c4673-2c7b-44d5-bd7f-be88d6432a70",
+   "metadata": {},
+   "source": [
+    "NOTE: in the current code of Smartnoise-SQL, there is no odometer. Meaning all queries are independant. If someone first query the private dataset for a count, then a second time for the average and then for the standard deviation then the total cost will be added: 3 count + 2 average + 1 std. That's why it is better to do all in one query."
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "b5ee7ad2",
@@ -738,7 +969,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 63,
+   "execution_count": 30,
    "id": "7d9ae766-4c0d-4dc5-9c9a-5f7eb99718f9",
    "metadata": {},
    "outputs": [],
@@ -748,7 +979,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 64,
+   "execution_count": 31,
    "id": "5006201d",
    "metadata": {},
    "outputs": [],
@@ -761,6 +992,35 @@
     "        FROM df GROUP BY species\""
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "37ce4596-7843-48dd-86cb-fb34b227db0e",
+   "metadata": {},
+   "source": [
+    "She checks the remaining budget:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "814883fa-a45a-43f2-852d-d5380beff8c0",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'remaining_epsilon': 7.5, 'remaining_delta': 0.004989999999999935}"
+      ]
+     },
+     "execution_count": 33,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "client.get_remaining_budget()"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "e725eb3f-d12f-4f62-8f57-06fb00639f91",
@@ -771,7 +1031,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 65,
+   "execution_count": 34,
    "id": "0255550b-7fd2-4244-a8eb-da809ddc6a5b",
    "metadata": {},
    "outputs": [
@@ -781,13 +1041,13 @@
        "{'epsilon_cost': 3.0, 'delta_cost': 4.999999999999449e-05}"
       ]
      },
-     "execution_count": 65,
+     "execution_count": 34,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "client.estimate_smartnoise_sql_cost(query = QUERY, epsilon = 1, delta = 1e-4)"
+    "client.estimate_smartnoise_sql_cost(query = QUERY, epsilon = 1.0, delta = 1e-4)"
    ]
   },
   {
@@ -800,7 +1060,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 66,
+   "execution_count": 37,
    "id": "80d9933b",
    "metadata": {},
    "outputs": [
@@ -808,12 +1068,12 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Server error status 400: {\"InvalidQueryException\":\"SQL Reader generated NAN results. Epsilon: 1.0 and Delta: 1.0 are too small to generate output.\"}\n"
+      "Server error status 400: {\"InvalidQueryException\":\"SQL Reader generated NAN results. Epsilon: 0.1 and Delta: 1e-08 are too small to generate output.\"}\n"
      ]
     }
    ],
    "source": [
-    "dummy_res = client.smartnoise_sql_query(query = QUERY, epsilon = 1, delta = 1.0, dummy = True)\n",
+    "dummy_res = client.smartnoise_sql_query(query = QUERY, epsilon = 0.1, delta = 1e-8, dummy = True)\n",
     "dummy_res"
    ]
   },
@@ -827,7 +1087,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 70,
+   "execution_count": 38,
    "id": "0e07fde9-9430-4a12-8337-0503ac162c26",
    "metadata": {},
    "outputs": [
@@ -835,18 +1095,18 @@
      "data": {
       "text/plain": [
        "{'query_response':      species  nb_penguin  avg_bill_length_mm  std_bill_length_mm\n",
-       " 0     Adelie          38           46.293151           13.104888\n",
-       " 1  Chinstrap          33           48.744280           14.518132\n",
-       " 2     Gentoo          28           43.582573           11.503930}"
+       " 0     Adelie          37           47.178744           14.518340\n",
+       " 1  Chinstrap          33           44.463978           21.150978\n",
+       " 2     Gentoo          30           43.079722            9.972813}"
       ]
      },
-     "execution_count": 70,
+     "execution_count": 38,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "dummy_res = client.smartnoise_sql_query(query = QUERY, epsilon = 2.0, delta = 1e-4, dummy = True)\n",
+    "dummy_res = client.smartnoise_sql_query(query = QUERY, epsilon = 7.5/3, delta = 1e-4, dummy = True)\n",
     "dummy_res"
    ]
   },
@@ -860,20 +1120,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 71,
+   "execution_count": 40,
    "id": "59f2d665",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Server error status 400: {\"InvalidQueryException\":\"Not enough budget for this query epsilon remaining 5.0, delta remaining 0.004989999999999935.\"}\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
-    "flipper_length_response = client.smartnoise_sql_query(query = QUERY, epsilon = 2.0, delta = 1e-4)"
+    "flipper_length_response = client.smartnoise_sql_query(query = QUERY, epsilon = 7.5/3, delta = 1e-4)"
    ]
   },
   {
@@ -894,7 +1146,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 46,
+   "execution_count": 41,
    "id": "748f125f",
    "metadata": {},
    "outputs": [
@@ -930,22 +1182,22 @@
        "      <th>0</th>\n",
        "      <td>Adelie</td>\n",
        "      <td>150</td>\n",
-       "      <td>38.649887</td>\n",
-       "      <td>3.997587</td>\n",
+       "      <td>38.863115</td>\n",
+       "      <td>1.902194</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td>Chinstrap</td>\n",
        "      <td>67</td>\n",
-       "      <td>49.285002</td>\n",
-       "      <td>5.859511</td>\n",
+       "      <td>48.859990</td>\n",
+       "      <td>2.766702</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td>Gentoo</td>\n",
        "      <td>122</td>\n",
-       "      <td>47.557167</td>\n",
-       "      <td>4.643492</td>\n",
+       "      <td>47.668297</td>\n",
+       "      <td>3.224660</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
@@ -953,12 +1205,12 @@
       ],
       "text/plain": [
        "     species  nb_penguin  avg_bill_length_mm  std_bill_length_mm\n",
-       "0     Adelie         150           38.649887            3.997587\n",
-       "1  Chinstrap          67           49.285002            5.859511\n",
-       "2     Gentoo         122           47.557167            4.643492"
+       "0     Adelie         150           38.863115            1.902194\n",
+       "1  Chinstrap          67           48.859990            2.766702\n",
+       "2     Gentoo         122           47.668297            3.224660"
       ]
      },
-     "execution_count": 46,
+     "execution_count": 41,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -978,7 +1230,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 47,
+   "execution_count": 42,
    "id": "0a7d7d4d",
    "metadata": {},
    "outputs": [],
@@ -990,7 +1242,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 48,
+   "execution_count": 43,
    "id": "bc3ee48a",
    "metadata": {},
    "outputs": [],
@@ -1002,7 +1254,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 49,
+   "execution_count": 44,
    "id": "1717f9ea",
    "metadata": {},
    "outputs": [
@@ -1010,9 +1262,9 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "T test between specie 0 and specie 1: -15.84.  Reject null hypothesis: True.\n",
-      "T test between specie 0 and specie 2: -17.04. Reject null hypothesis: True.\n",
-      "T test between specie 1 and specie 2: 2.24.   Reject null hypothesis: True.\n"
+      "T test between specie 0 and specie 1: -31.39.  Reject null hypothesis: True.\n",
+      "T test between specie 0 and specie 2: -28.95. Reject null hypothesis: True.\n",
+      "T test between specie 1 and specie 2: 2.56.   Reject null hypothesis: True.\n"
      ]
     }
    ],
@@ -1044,7 +1296,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 50,
+   "execution_count": 45,
    "id": "9289bc26",
    "metadata": {},
    "outputs": [
@@ -1083,31 +1335,31 @@
        "      <th>0</th>\n",
        "      <td>Adelie</td>\n",
        "      <td>150</td>\n",
-       "      <td>38.649887</td>\n",
-       "      <td>3.997587</td>\n",
-       "      <td>0.326402</td>\n",
-       "      <td>38.010140</td>\n",
-       "      <td>39.289634</td>\n",
+       "      <td>38.863115</td>\n",
+       "      <td>1.902194</td>\n",
+       "      <td>0.155314</td>\n",
+       "      <td>38.558700</td>\n",
+       "      <td>39.167529</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td>Chinstrap</td>\n",
        "      <td>67</td>\n",
-       "      <td>49.285002</td>\n",
-       "      <td>5.859511</td>\n",
-       "      <td>0.715853</td>\n",
-       "      <td>47.881930</td>\n",
-       "      <td>50.688074</td>\n",
+       "      <td>48.859990</td>\n",
+       "      <td>2.766702</td>\n",
+       "      <td>0.338006</td>\n",
+       "      <td>48.197497</td>\n",
+       "      <td>49.522483</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td>Gentoo</td>\n",
        "      <td>122</td>\n",
-       "      <td>47.557167</td>\n",
-       "      <td>4.643492</td>\n",
-       "      <td>0.420402</td>\n",
-       "      <td>46.733179</td>\n",
-       "      <td>48.381155</td>\n",
+       "      <td>47.668297</td>\n",
+       "      <td>3.224660</td>\n",
+       "      <td>0.291947</td>\n",
+       "      <td>47.096081</td>\n",
+       "      <td>48.240513</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
@@ -1115,17 +1367,17 @@
       ],
       "text/plain": [
        "     species  nb_penguin  avg_bill_length_mm  std_bill_length_mm  \\\n",
-       "0     Adelie         150           38.649887            3.997587   \n",
-       "1  Chinstrap          67           49.285002            5.859511   \n",
-       "2     Gentoo         122           47.557167            4.643492   \n",
+       "0     Adelie         150           38.863115            1.902194   \n",
+       "1  Chinstrap          67           48.859990            2.766702   \n",
+       "2     Gentoo         122           47.668297            3.224660   \n",
        "\n",
        "   standard_error  ci_95_lower_bound  ci_95_upper_bound  \n",
-       "0        0.326402          38.010140          39.289634  \n",
-       "1        0.715853          47.881930          50.688074  \n",
-       "2        0.420402          46.733179          48.381155  "
+       "0        0.155314          38.558700          39.167529  \n",
+       "1        0.338006          48.197497          49.522483  \n",
+       "2        0.291947          47.096081          48.240513  "
       ]
      },
-     "execution_count": 50,
+     "execution_count": 45,
      "metadata": {},
      "output_type": "execute_result"
     }

From b079a550197c42f7fe7d23d2b8afd8c7e3ce5497 Mon Sep 17 00:00:00 2001
From: PaulineMauryL <pauline.maury-laribiere@bfs.admin.ch>
Date: Tue, 10 Sep 2024 11:37:00 +0000
Subject: [PATCH 04/12] add example for postprocess

---
 .../Demo_Client_Notebook_Smartnoise-SQL.ipynb | 59 ++++++++++++-------
 1 file changed, 39 insertions(+), 20 deletions(-)

diff --git a/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb b/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
index 54f15a60..7f80285f 100644
--- a/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
+++ b/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
@@ -128,7 +128,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 170,
+   "execution_count": 5,
    "id": "d15cbe39",
    "metadata": {},
    "outputs": [
@@ -154,7 +154,7 @@
        " 'rows': 344}"
       ]
      },
-     "execution_count": 170,
+     "execution_count": 5,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -227,7 +227,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 239,
+   "execution_count": 8,
    "id": "3946425d",
    "metadata": {},
    "outputs": [],
@@ -238,7 +238,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 240,
+   "execution_count": 9,
    "id": "99494f15-727d-4d03-a099-5cfe5a0c8a27",
    "metadata": {},
    "outputs": [],
@@ -249,7 +249,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 241,
+   "execution_count": 10,
    "id": "90cf2a6d",
    "metadata": {},
    "outputs": [],
@@ -265,17 +265,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 242,
+   "execution_count": 11,
    "id": "f3a736f7-be77-4214-8f77-6abc7db34793",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "'Average bill length on dummy: 46.06mm.'"
+       "'Average bill length on dummy: 45.79mm.'"
       ]
      },
-     "execution_count": 242,
+     "execution_count": 11,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -296,7 +296,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 243,
+   "execution_count": 12,
    "id": "133020c6",
    "metadata": {},
    "outputs": [],
@@ -310,7 +310,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 244,
+   "execution_count": 13,
    "id": "ff19802d-cb39-48ee-9874-340a4bf2cc31",
    "metadata": {},
    "outputs": [
@@ -320,7 +320,7 @@
        "'This query would actually cost her 1.0 epsilon and 5.000000000032756e-06 delta.'"
       ]
      },
-     "execution_count": 244,
+     "execution_count": 13,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -355,7 +355,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 248,
+   "execution_count": 14,
    "id": "1f726ce8-2e3d-462a-bbd8-598198935bc9",
    "metadata": {},
    "outputs": [],
@@ -372,17 +372,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 249,
+   "execution_count": 15,
    "id": "46e064f0-f1e2-49af-8f14-fde44f981813",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "'Average bill length on dummy: 48.36mm.'"
+       "'Average bill length on dummy: 48.29mm.'"
       ]
      },
-     "execution_count": 249,
+     "execution_count": 15,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -394,7 +394,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 250,
+   "execution_count": 16,
    "id": "7e20014d-ad82-4a2d-88d9-ec981150e7db",
    "metadata": {},
    "outputs": [
@@ -404,7 +404,7 @@
        "{'epsilon_cost': 1.0, 'delta_cost': 1.4999949999983109e-05}"
       ]
      },
-     "execution_count": 250,
+     "execution_count": 16,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -432,7 +432,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 17,
    "id": "69767fac",
    "metadata": {},
    "outputs": [],
@@ -441,13 +441,14 @@
     "    query = QUERY,  \n",
     "    epsilon = EPSILON, \n",
     "    delta = DELTA,\n",
+    "    mechanisms = {\"count\": \"gaussian\", \"sum_float\": \"laplace\"},\n",
     "    dummy = False\n",
     ")"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 18,
    "id": "6dbbdf93",
    "metadata": {},
    "outputs": [
@@ -455,7 +456,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Average bill length on private data: 43.3mm.\n"
+      "Average bill length on private data: 44.88mm.\n"
      ]
     }
    ],
@@ -472,6 +473,24 @@
     "After each query on the real dataset, the budget informations are also returned to the researcher. It is possible possible to check the remaining budget again afterwards:"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "1472b825-bcea-458f-930e-41ff0f5d5f93",
+   "metadata": {},
+   "source": [
+    "### Postprocess "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1e7effb0-b3b8-427d-8772-cfab8d4b272d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "It is also possible to use the 'postprocess' argument from Smartnoise-SQL (see it)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "04929993",

From 06556c90da1daef38cf1d45f6a8a0db4f2b9a3bb Mon Sep 17 00:00:00 2001
From: PaulineMauryL <pauline.maury-laribiere@bfs.admin.ch>
Date: Tue, 10 Sep 2024 11:55:42 +0000
Subject: [PATCH 05/12] fix smartnoise-sql bug for postrocess  and adapt
 notebook

---
 .../Demo_Client_Notebook_Smartnoise-SQL.ipynb | 70 +++++++++++++++++--
 .../dp_queries/dp_libraries/smartnoise_sql.py |  4 +-
 2 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb b/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
index 7f80285f..19f2c9d0 100644
--- a/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
+++ b/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
@@ -272,7 +272,7 @@
     {
      "data": {
       "text/plain": [
-       "'Average bill length on dummy: 45.79mm.'"
+       "'Average bill length on dummy: 45.62mm.'"
       ]
      },
      "execution_count": 11,
@@ -379,7 +379,7 @@
     {
      "data": {
       "text/plain": [
-       "'Average bill length on dummy: 48.29mm.'"
+       "'Average bill length on dummy: 42.39mm.'"
       ]
      },
      "execution_count": 15,
@@ -456,7 +456,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Average bill length on private data: 44.88mm.\n"
+      "Average bill length on private data: 43.36mm.\n"
      ]
     }
    ],
@@ -481,14 +481,70 @@
     "### Postprocess "
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "ab34449e-7456-4e5e-b5bb-4231204c4d7e",
+   "metadata": {},
+   "source": [
+    "It is also possible to use the 'postprocess' argument from Smartnoise-SQL [see its documentation here](https://docs.smartnoise.org/sql/advanced.html#postprocess) by specifying it in the query."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "id": "1e7effb0-b3b8-427d-8772-cfab8d4b272d",
+   "execution_count": 19,
+   "id": "50c38d09-32ea-4269-9ca7-eacfd1d9ad96",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'query_response':    avg_bill_length_mm\n",
+       " 0           46.534449}"
+      ]
+     },
+     "execution_count": 19,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
-    "It is also possible to use the 'postprocess' argument from Smartnoise-SQL (see it)"
+    "dummy_res = client.smartnoise_sql_query(\n",
+    "    query = QUERY,  \n",
+    "    epsilon = EPSILON,\n",
+    "    delta = DELTA,\n",
+    "    postprocess = True,\n",
+    "    dummy = True,\n",
+    ")\n",
+    "dummy_res"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "df6f2526-612e-4f00-b15a-c0433573e652",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'query_response':          res_0       res_1\n",
+       " 0  4664.287202  100.922812}"
+      ]
+     },
+     "execution_count": 20,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dummy_res = client.smartnoise_sql_query(\n",
+    "    query = QUERY,  \n",
+    "    epsilon = EPSILON,\n",
+    "    delta = DELTA,\n",
+    "    postprocess = False,\n",
+    "    dummy = True,\n",
+    ")\n",
+    "dummy_res"
    ]
   },
   {
diff --git a/server/lomas_server/dp_queries/dp_libraries/smartnoise_sql.py b/server/lomas_server/dp_queries/dp_libraries/smartnoise_sql.py
index dd75773e..41114c4f 100644
--- a/server/lomas_server/dp_queries/dp_libraries/smartnoise_sql.py
+++ b/server/lomas_server/dp_queries/dp_libraries/smartnoise_sql.py
@@ -94,10 +94,10 @@ def query(self, query_json: SmartnoiseSQLModel, nb_iter: int = 0) -> dict:
                 DPLibraries.SMARTNOISE_SQL,
                 "Error executing query:" + str(e),
             ) from e
-
         if not query_json.postprocess:
-            result = list(result)
+            result = list(result)[0]
             cols = [f"res_{i}" for i in range(len(result))]
+            result = [result]
         else:
             cols = result.pop(0)
         if result == []:

From be4a2e475a84cf46ad9cc6295949a3f007c94485 Mon Sep 17 00:00:00 2001
From: PaulineMauryL <pauline.maury-laribiere@bfs.admin.ch>
Date: Wed, 11 Sep 2024 07:14:57 +0000
Subject: [PATCH 06/12] fix typos in notebook

---
 .../Demo_Client_Notebook_Smartnoise-SQL.ipynb | 248 +++++++++---------
 1 file changed, 127 insertions(+), 121 deletions(-)

diff --git a/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb b/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
index 19f2c9d0..b4068ff4 100644
--- a/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
+++ b/client/notebooks/Demo_Client_Notebook_Smartnoise-SQL.ipynb
@@ -94,7 +94,7 @@
     "- user_name: her name as registered in the database (Dr. Alice Antartica)\n",
     "- dataset_name: the name of the dataset that she wants to query (PENGUIN)\n",
     "\n",
-    "She will only be able to query on the real dataset if the administratir has previously made her an account in the database, given her access to the PENGUIN dataset and has given her some $\\epsilon$, $\\delta$ privacy loss budget."
+    "She will only be able to query on the real dataset if the administrator has previously made her an account in the database, given her access to the PENGUIN dataset and has given her some $\\epsilon$, $\\delta$ privacy loss budget."
    ]
   },
   {
@@ -220,7 +220,7 @@
    "source": [
     "### Query dummy dataset\n",
     "\n",
-    "Now that she has an idea of what the data looks like, she wants to start querying the real dataset to for her research. However, before this other tools are at her disposal to reduce potential error risks and avoid spending budget on irrelevant queries. Of course, this does not have any impact on the budget.\n",
+    "Now that she has an idea of what the data looks like, she wants to start querying the real dataset to for her research. However, before this, other tools are at her disposal to reduce potential error risks and avoid spending budget on irrelevant queries. Of course, this does not have any impact on the budget.\n",
     "\n",
     "It is possible to specify the flag `dummy=True` in the various queries to perform the query on the dummy dataset instead of the real dataset and ensure that the queries are doing what is expected of them. "
    ]
@@ -272,7 +272,7 @@
     {
      "data": {
       "text/plain": [
-       "'Average bill length on dummy: 45.62mm.'"
+       "'Average bill length on dummy: 46.68mm.'"
       ]
      },
      "execution_count": 11,
@@ -334,7 +334,7 @@
    "id": "c255d210-7ba1-4152-8a30-97c7289dd361",
    "metadata": {},
    "source": [
-    "This is actually the double than what she put in input. In the background, Smartnoise-SQL decomposes the DP query in multiple other queries and the budget given in input is spent on each of these sub-queries. Here for the average, we need a sum divided by a count, hence `EPSILON` is spent once for the sum and then once more for the count. (see NOTE below for tips and explanation)."
+    "This is actually twice as much as what she initially put in. In the background, Smartnoise-SQL decomposes the DP query in multiple other queries and the budget given as input is spent on each of these sub-queries. Here for the average, we need a sum divided by a count, hence `EPSILON` is spent once for the sum and then once more for the count. (see NOTE below for tips and explanation)."
    ]
   },
   {
@@ -379,7 +379,7 @@
     {
      "data": {
       "text/plain": [
-       "'Average bill length on dummy: 42.39mm.'"
+       "'Average bill length on dummy: 50.83mm.'"
       ]
      },
      "execution_count": 15,
@@ -425,7 +425,7 @@
    "metadata": {},
    "source": [
     "### Query real dataset\n",
-    "Dr. Antartica is ready to query on the real dataset and get a differentially private response of the average bill length. By default, the flag `dummy` is False so setting it is optional. She uses the values of `epsilon` and `delta` that she selected just before.\n",
+    "Dr. Antartica is ready to query the real dataset and get a differentially private response for the average bill length. The `dummy` flag is False by default, so setting it is optional. She uses the values of `epsilon` and `delta` that she selected just before.\n",
     "\n",
     "Careful: This command DOES spend the budget of the user and the remaining budget is updated for every query."
    ]
@@ -456,7 +456,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Average bill length on private data: 43.36mm.\n"
+      "Average bill length on private data: 45.19mm.\n"
      ]
     }
    ],
@@ -499,7 +499,7 @@
      "data": {
       "text/plain": [
        "{'query_response':    avg_bill_length_mm\n",
-       " 0           46.534449}"
+       " 0           46.850983}"
       ]
      },
      "execution_count": 19,
@@ -527,8 +527,8 @@
     {
      "data": {
       "text/plain": [
-       "{'query_response':          res_0       res_1\n",
-       " 0  4664.287202  100.922812}"
+       "{'query_response':          res_0      res_1\n",
+       " 0  4659.909203  96.041455}"
       ]
      },
      "execution_count": 20,
@@ -568,12 +568,12 @@
    "id": "9d41bd58",
    "metadata": {},
    "source": [
-    "She is first interested to have a better idea of the distribution of bill length of all species. She already has the number of penguins (=number of rows as `max_ids=1`) from the metadata and the average bill length from step 3, so she just need to compute the standard deviation. As it is just an exploration step, she uses very little budget values."
+    "She is first interested to have a better idea of the distribution of bill length of all species. She already has the number of penguins (=number of rows as `max_ids=1`) from the metadata and the average bill length from step 3, so she just needs to compute the standard deviation. As it is just an exploration step, she uses very little budget values."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 21,
    "id": "04b376ef",
    "metadata": {},
    "outputs": [],
@@ -591,7 +591,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": 22,
    "id": "5aa9c304",
    "metadata": {},
    "outputs": [],
@@ -606,17 +606,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
+   "execution_count": 23,
    "id": "49e4ba47-adf3-471b-a35b-c44346ed12a8",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "'The dummy standard variation is 8.53.'"
+       "'The dummy standard variation is 16.64.'"
       ]
      },
-     "execution_count": 18,
+     "execution_count": 23,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -636,7 +636,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": 24,
    "id": "a8fa2c49",
    "metadata": {},
    "outputs": [],
@@ -650,7 +650,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": 25,
    "id": "b3aa05ca-3243-4415-a8ec-fb5ad47d244d",
    "metadata": {},
    "outputs": [
@@ -660,7 +660,7 @@
        "'This query would actually cost her 1.5 epsilon and 5.000000000032756e-06 delta.'"
       ]
      },
-     "execution_count": 20,
+     "execution_count": 25,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -674,12 +674,12 @@
    "id": "884f0337-a960-460e-8797-84ddd77974a3",
    "metadata": {},
    "source": [
-    "This times it is three times the budget as the standard deviation needs the average, then a difference and a count again. "
+    "This time it is three times the budget because the standard deviation needs the average, then a difference and a count again. "
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 21,
+   "execution_count": 26,
    "id": "534979fb",
    "metadata": {},
    "outputs": [],
@@ -693,7 +693,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 22,
+   "execution_count": 27,
    "id": "674332e7",
    "metadata": {},
    "outputs": [
@@ -701,7 +701,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Standard deviation of bill length: 3.0.\n"
+      "Standard deviation of bill length: 8.83.\n"
      ]
     }
    ],
@@ -715,12 +715,12 @@
    "id": "367081be-1159-45d8-9129-88fba20fb697",
    "metadata": {},
    "source": [
-    "She can now do all the postprocessing that she wants with the returned data without adding any privacy risk. "
+    "She can now do all the postprocessing that she wants with the returned data without increasing the privacy risk. "
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 28,
    "id": "f72b19d0",
    "metadata": {},
    "outputs": [
@@ -728,7 +728,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Standard error of bill length: 0.16.\n"
+      "Standard error of bill length: 0.48.\n"
      ]
     }
    ],
@@ -740,7 +740,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 24,
+   "execution_count": 29,
    "id": "62630a03",
    "metadata": {},
    "outputs": [
@@ -748,7 +748,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "The 95% confidence interval of the bill length of all penguins is [42.98, 43.61].\n"
+      "The 95% confidence interval of the bill length of all penguins is [44.25, 46.12].\n"
      ]
     }
    ],
@@ -772,12 +772,12 @@
    "id": "c9aa0b56-bda3-405e-9f33-ae7135dbfeba",
    "metadata": {},
    "source": [
-    "All of these queries will cost the same budget in Smartnoise-SQL. The reason is that the smartnoise-sql translates the input query in sub queries, find the answer for each sub query for the budget in input and then assemble the results. For the first 'standard deviation' query, it requires a count, an average, and only then the computation for the standard deviation. Hence, to save budget it is better to make a general query directly and retrieve all the 'sub-answers'."
+    "All of these queries will cost the same budget in Smartnoise-SQL. The reason is that the smartnoise-sql translates the input query in sub queries, finds the answer for each sub query for the budget in input and then assembles the results. For the first 'standard deviation' query, it requires a count, an average, and only then the computation for the standard deviation. Hence, to save budget it is better to make a general query directly and retrieve all the 'sub-answers'."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 25,
+   "execution_count": 30,
    "id": "611df7d2-86eb-4710-a6eb-a3de214ece37",
    "metadata": {},
    "outputs": [],
@@ -788,7 +788,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": 31,
    "id": "32b76d26-edce-4cf9-bab9-bf1ea936d288",
    "metadata": {},
    "outputs": [
@@ -798,7 +798,7 @@
        "{'epsilon_cost': 3.0, 'delta_cost': 5.000000000032756e-06}"
       ]
      },
-     "execution_count": 26,
+     "execution_count": 31,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -811,7 +811,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": 32,
    "id": "f84411ed-dab5-4acc-ab49-bfec9ebc3530",
    "metadata": {},
    "outputs": [
@@ -821,7 +821,7 @@
        "{'epsilon_cost': 3.0, 'delta_cost': 5.000000000032756e-06}"
       ]
      },
-     "execution_count": 27,
+     "execution_count": 32,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -834,7 +834,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": 33,
    "id": "2454db71-4074-46dd-a863-c690c0160c51",
    "metadata": {},
    "outputs": [
@@ -844,7 +844,7 @@
        "{'epsilon_cost': 3.0, 'delta_cost': 5.000000000032756e-06}"
       ]
      },
-     "execution_count": 28,
+     "execution_count": 33,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -860,12 +860,12 @@
    "id": "73bd85ca-eed0-488f-807e-6f03f99898cb",
    "metadata": {},
    "source": [
-    "A way to know the sub queries of a query is to use the following Smartnoise-SQL code:"
+    "A way to know the sub-queries of a query is to use the following Smartnoise-SQL code:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 171,
+   "execution_count": 34,
    "id": "5b51cf35-68db-4b11-acbe-8df15b826d10",
    "metadata": {},
    "outputs": [],
@@ -879,7 +879,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 172,
+   "execution_count": 35,
    "id": "7ab04c8f-8d79-4871-bc16-1f0368fbd403",
    "metadata": {},
    "outputs": [],
@@ -891,17 +891,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 173,
+   "execution_count": 36,
    "id": "a78d7d86-ab95-4521-b84d-49ac795316c3",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "<snsql._ast.ast.Query at 0x7b569ddbd6d0>"
+       "<snsql._ast.ast.Query at 0x70772e4e5310>"
       ]
      },
-     "execution_count": 173,
+     "execution_count": 36,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -925,17 +925,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 185,
+   "execution_count": 37,
    "id": "251b773a-864c-4852-ae89-1472ac768975",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "{'std_bl': <snsql._ast.tokens.Symbol at 0x7b569ddbf090>}"
+       "{'std_bl': <snsql._ast.tokens.Symbol at 0x70772e4f5a90>}"
       ]
      },
-     "execution_count": 185,
+     "execution_count": 37,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -956,20 +956,20 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 213,
+   "execution_count": 38,
    "id": "f4ac4261-e870-4f07-8264-9a2041a35abc",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "{'keycount': <snsql._ast.tokens.Symbol at 0x7b569de9c850>,\n",
-       " 'sum_alias_0xd0e4': <snsql._ast.tokens.Symbol at 0x7b569df1d690>,\n",
-       " 'count_bill_length_mm': <snsql._ast.tokens.Symbol at 0x7b569debbd10>,\n",
-       " 'sum_bill_length_mm': <snsql._ast.tokens.Symbol at 0x7b569dde6350>}"
+       "{'keycount': <snsql._ast.tokens.Symbol at 0x70772f81b410>,\n",
+       " 'sum_alias_0xde09': <snsql._ast.tokens.Symbol at 0x70772e4e7990>,\n",
+       " 'count_bill_length_mm': <snsql._ast.tokens.Symbol at 0x70772e4e7a10>,\n",
+       " 'sum_bill_length_mm': <snsql._ast.tokens.Symbol at 0x70772e4779d0>}"
       ]
      },
-     "execution_count": 213,
+     "execution_count": 38,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -985,12 +985,12 @@
    "id": "cc07d8c4-153f-4ad3-a977-a971b94d75aa",
    "metadata": {},
    "source": [
-    "This last query with `group_by` will cost the same because `max_ids=1` (at most a penguin is once in the dataset) and so the `group_by` is applied on different partitions of the population."
+    "This last query with `group_by` will cost the same because `max_ids=1` (a penguin appears in the dataset at most once) and so the `group_by` is applied on different partitions of the population."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 124,
+   "execution_count": 39,
    "id": "5b69f3f2-07dd-48b8-9cd5-64eee53331f7",
    "metadata": {},
    "outputs": [
@@ -1000,7 +1000,7 @@
        "{'epsilon_cost': 3.0, 'delta_cost': 5.000000000032756e-06}"
       ]
      },
-     "execution_count": 124,
+     "execution_count": 39,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -1016,7 +1016,7 @@
    "id": "e20c4673-2c7b-44d5-bd7f-be88d6432a70",
    "metadata": {},
    "source": [
-    "NOTE: in the current code of Smartnoise-SQL, there is no odometer. Meaning all queries are independant. If someone first query the private dataset for a count, then a second time for the average and then for the standard deviation then the total cost will be added: 3 count + 2 average + 1 std. That's why it is better to do all in one query."
+    "NOTE: in the current code of Smartnoise-SQL, there is no odometer. Meaning all queries are independant. If someone first queries the private dataset for a count, then a second time for the average and then for the standard deviation then the total cost will be added: 3 count + 2 average + 1 std. That's why it is better to do everything in one query."
    ]
   },
   {
@@ -1044,7 +1044,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 40,
    "id": "7d9ae766-4c0d-4dc5-9c9a-5f7eb99718f9",
    "metadata": {},
    "outputs": [],
@@ -1054,7 +1054,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": 41,
    "id": "5006201d",
    "metadata": {},
    "outputs": [],
@@ -1077,17 +1077,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": 42,
    "id": "814883fa-a45a-43f2-852d-d5380beff8c0",
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "{'remaining_epsilon': 7.5, 'remaining_delta': 0.004989999999999935}"
+       "{'remaining_epsilon': 7.5, 'remaining_delta': 0.004980000049999984}"
       ]
      },
-     "execution_count": 33,
+     "execution_count": 42,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -1106,7 +1106,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": 43,
    "id": "0255550b-7fd2-4244-a8eb-da809ddc6a5b",
    "metadata": {},
    "outputs": [
@@ -1116,7 +1116,7 @@
        "{'epsilon_cost': 3.0, 'delta_cost': 4.999999999999449e-05}"
       ]
      },
-     "execution_count": 34,
+     "execution_count": 43,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -1135,16 +1135,22 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 37,
+   "execution_count": 44,
    "id": "80d9933b",
    "metadata": {},
    "outputs": [
     {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Server error status 400: {\"InvalidQueryException\":\"SQL Reader generated NAN results. Epsilon: 0.1 and Delta: 1e-08 are too small to generate output.\"}\n"
-     ]
+     "data": {
+      "text/plain": [
+       "{'query_response':      species  nb_penguin  avg_bill_length_mm  std_bill_length_mm\n",
+       " 0     Adelie          25           59.830486           27.444144\n",
+       " 1  Chinstrap          47           13.527649           28.501660\n",
+       " 2     Gentoo          28           42.375624           33.967001}"
+      ]
+     },
+     "execution_count": 44,
+     "metadata": {},
+     "output_type": "execute_result"
     }
    ],
    "source": [
@@ -1162,7 +1168,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 38,
+   "execution_count": 45,
    "id": "0e07fde9-9430-4a12-8337-0503ac162c26",
    "metadata": {},
    "outputs": [
@@ -1170,12 +1176,12 @@
      "data": {
       "text/plain": [
        "{'query_response':      species  nb_penguin  avg_bill_length_mm  std_bill_length_mm\n",
-       " 0     Adelie          37           47.178744           14.518340\n",
-       " 1  Chinstrap          33           44.463978           21.150978\n",
-       " 2     Gentoo          30           43.079722            9.972813}"
+       " 0     Adelie          36           49.021514            3.944748\n",
+       " 1  Chinstrap          31           49.048848            3.801831\n",
+       " 2     Gentoo          30           41.176308            7.628134}"
       ]
      },
-     "execution_count": 38,
+     "execution_count": 45,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -1195,7 +1201,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 40,
+   "execution_count": 46,
    "id": "59f2d665",
    "metadata": {},
    "outputs": [],
@@ -1221,7 +1227,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 41,
+   "execution_count": 47,
    "id": "748f125f",
    "metadata": {},
    "outputs": [
@@ -1256,23 +1262,23 @@
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td>Adelie</td>\n",
-       "      <td>150</td>\n",
-       "      <td>38.863115</td>\n",
-       "      <td>1.902194</td>\n",
+       "      <td>151</td>\n",
+       "      <td>38.362705</td>\n",
+       "      <td>5.465330</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td>Chinstrap</td>\n",
        "      <td>67</td>\n",
-       "      <td>48.859990</td>\n",
-       "      <td>2.766702</td>\n",
+       "      <td>48.867188</td>\n",
+       "      <td>3.828321</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td>Gentoo</td>\n",
-       "      <td>122</td>\n",
-       "      <td>47.668297</td>\n",
-       "      <td>3.224660</td>\n",
+       "      <td>123</td>\n",
+       "      <td>47.257728</td>\n",
+       "      <td>5.387484</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
@@ -1280,12 +1286,12 @@
       ],
       "text/plain": [
        "     species  nb_penguin  avg_bill_length_mm  std_bill_length_mm\n",
-       "0     Adelie         150           38.863115            1.902194\n",
-       "1  Chinstrap          67           48.859990            2.766702\n",
-       "2     Gentoo         122           47.668297            3.224660"
+       "0     Adelie         151           38.362705            5.465330\n",
+       "1  Chinstrap          67           48.867188            3.828321\n",
+       "2     Gentoo         123           47.257728            5.387484"
       ]
      },
-     "execution_count": 41,
+     "execution_count": 47,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -1305,7 +1311,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 42,
+   "execution_count": 48,
    "id": "0a7d7d4d",
    "metadata": {},
    "outputs": [],
@@ -1317,7 +1323,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 43,
+   "execution_count": 49,
    "id": "bc3ee48a",
    "metadata": {},
    "outputs": [],
@@ -1329,7 +1335,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 44,
+   "execution_count": 50,
    "id": "1717f9ea",
    "metadata": {},
    "outputs": [
@@ -1337,9 +1343,9 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "T test between specie 0 and specie 1: -31.39.  Reject null hypothesis: True.\n",
-      "T test between specie 0 and specie 2: -28.95. Reject null hypothesis: True.\n",
-      "T test between specie 1 and specie 2: 2.56.   Reject null hypothesis: True.\n"
+      "T test between specie 0 and specie 1: -14.41.  Reject null hypothesis: True.\n",
+      "T test between specie 0 and specie 2: -13.49. Reject null hypothesis: True.\n",
+      "T test between specie 1 and specie 2: 2.19.   Reject null hypothesis: True.\n"
      ]
     }
    ],
@@ -1348,9 +1354,9 @@
     "t_02 = t_test(avg_0, avg_2, std_0, std_2, nb_0, nb_2)\n",
     "t_12 = t_test(avg_1, avg_2, std_1, std_2, nb_1, nb_2)\n",
     "\n",
-    "print(f\"T test between specie 0 and specie 1: {np.round(t_01, 2)}.  Reject null hypothesis: {abs(t_01) > CRITICAL_VALUE}.\")\n",
-    "print(f\"T test between specie 0 and specie 2: {np.round(t_02, 2)}. Reject null hypothesis: {abs(t_02) > CRITICAL_VALUE}.\")\n",
-    "print(f\"T test between specie 1 and specie 2: {np.round(t_12, 2)}.   Reject null hypothesis: {abs(t_12) > CRITICAL_VALUE}.\")"
+    "print(f\"T test between species 0 and specie 1: {np.round(t_01, 2)}.  Reject null hypothesis: {abs(t_01) > CRITICAL_VALUE}.\")\n",
+    "print(f\"T test between species 0 and specie 2: {np.round(t_02, 2)}. Reject null hypothesis: {abs(t_02) > CRITICAL_VALUE}.\")\n",
+    "print(f\"T test between species 1 and specie 2: {np.round(t_12, 2)}.   Reject null hypothesis: {abs(t_12) > CRITICAL_VALUE}.\")"
    ]
   },
   {
@@ -1371,7 +1377,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 45,
+   "execution_count": 51,
    "id": "9289bc26",
    "metadata": {},
    "outputs": [
@@ -1409,32 +1415,32 @@
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td>Adelie</td>\n",
-       "      <td>150</td>\n",
-       "      <td>38.863115</td>\n",
-       "      <td>1.902194</td>\n",
-       "      <td>0.155314</td>\n",
-       "      <td>38.558700</td>\n",
-       "      <td>39.167529</td>\n",
+       "      <td>151</td>\n",
+       "      <td>38.362705</td>\n",
+       "      <td>5.465330</td>\n",
+       "      <td>0.444762</td>\n",
+       "      <td>37.490971</td>\n",
+       "      <td>39.234439</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td>Chinstrap</td>\n",
        "      <td>67</td>\n",
-       "      <td>48.859990</td>\n",
-       "      <td>2.766702</td>\n",
-       "      <td>0.338006</td>\n",
-       "      <td>48.197497</td>\n",
-       "      <td>49.522483</td>\n",
+       "      <td>48.867188</td>\n",
+       "      <td>3.828321</td>\n",
+       "      <td>0.467704</td>\n",
+       "      <td>47.950489</td>\n",
+       "      <td>49.783888</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td>Gentoo</td>\n",
-       "      <td>122</td>\n",
-       "      <td>47.668297</td>\n",
-       "      <td>3.224660</td>\n",
-       "      <td>0.291947</td>\n",
-       "      <td>47.096081</td>\n",
-       "      <td>48.240513</td>\n",
+       "      <td>123</td>\n",
+       "      <td>47.257728</td>\n",
+       "      <td>5.387484</td>\n",
+       "      <td>0.485773</td>\n",
+       "      <td>46.305613</td>\n",
+       "      <td>48.209843</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
@@ -1442,17 +1448,17 @@
       ],
       "text/plain": [
        "     species  nb_penguin  avg_bill_length_mm  std_bill_length_mm  \\\n",
-       "0     Adelie         150           38.863115            1.902194   \n",
-       "1  Chinstrap          67           48.859990            2.766702   \n",
-       "2     Gentoo         122           47.668297            3.224660   \n",
+       "0     Adelie         151           38.362705            5.465330   \n",
+       "1  Chinstrap          67           48.867188            3.828321   \n",
+       "2     Gentoo         123           47.257728            5.387484   \n",
        "\n",
        "   standard_error  ci_95_lower_bound  ci_95_upper_bound  \n",
-       "0        0.155314          38.558700          39.167529  \n",
-       "1        0.338006          48.197497          49.522483  \n",
-       "2        0.291947          47.096081          48.240513  "
+       "0        0.444762          37.490971          39.234439  \n",
+       "1        0.467704          47.950489          49.783888  \n",
+       "2        0.485773          46.305613          48.209843  "
       ]
      },
-     "execution_count": 45,
+     "execution_count": 51,
      "metadata": {},
      "output_type": "execute_result"
     }

From 8a5ffdbf6b6b28e21ace48b3070a10c6923e49a3 Mon Sep 17 00:00:00 2001
From: PaulineMauryL <pauline.maury-laribiere@bfs.admin.ch>
Date: Wed, 11 Sep 2024 09:56:36 +0000
Subject: [PATCH 07/12] add test

---
 server/lomas_server/tests/test_api.py | 62 ++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/server/lomas_server/tests/test_api.py b/server/lomas_server/tests/test_api.py
index a18ad686..3cf94b19 100644
--- a/server/lomas_server/tests/test_api.py
+++ b/server/lomas_server/tests/test_api.py
@@ -473,7 +473,67 @@ def test_smartnoise_sql_query(self) -> None:
                 + "Please, verify the client object initialisation."
             }
 
-    def test_smartnoise_query_datetime(self) -> None:
+    def test_smartnoise_sql_query_parameters(self) -> None:
+        """Test smartnoise-sql query parameters"""
+        with TestClient(app, headers=self.headers) as client:
+            # Change the Query
+            body = dict(example_smartnoise_sql)
+            body["query_str"] = (
+                "SELECT AVG(bill_length_mm) AS avg_bill_length_mm FROM df"
+            )
+            response = client.post(
+                "/smartnoise_sql_query",
+                json=body,
+                headers=self.headers,
+            )
+            assert response.status_code == status.HTTP_200_OK
+            response_dict = json.loads(response.content.decode("utf8"))
+            assert response_dict["query_response"]["columns"] == [
+                "avg_bill_length_mm"
+            ]
+            df_response = pd.DataFrame.from_dict(
+                response_dict["query_response"], orient="tight"
+            )
+            assert df_response["avg_bill_length_mm"] > 0.0
+
+            response_dict = json.loads(response.content.decode("utf8"))
+            assert response_dict["requested_by"] == self.user_name
+
+            # Change the mechaism
+            body["mechanisms"] = {"count": "gaussian", "sum_float": "laplace"}
+            response = client.post(
+                "/smartnoise_sql_query",
+                json=body,
+                headers=self.headers,
+            )
+            assert response.status_code == status.HTTP_200_OK
+            response_dict = json.loads(response.content.decode("utf8"))
+            assert response_dict["query_response"]["columns"] == [
+                "avg_bill_length_mm"
+            ]
+            df_response = pd.DataFrame.from_dict(
+                response_dict["query_response"], orient="tight"
+            )
+            assert df_response["avg_bill_length_mm"] > 0.0
+
+            # Try postprocess False
+            body["postprocess"] = False
+            response = client.post(
+                "/smartnoise_sql_query",
+                json=body,
+                headers=self.headers,
+            )
+            assert response.status_code == status.HTTP_200_OK
+            response_dict = json.loads(response.content.decode("utf8"))
+            assert response_dict["query_response"]["columns"] == [
+                "avg_bill_length_mm"
+            ]
+            df_response = pd.DataFrame.from_dict(
+                response_dict["query_response"], orient="tight"
+            )
+            assert df_response.shape[1] == 2
+
+    def test_smartnoise_sql_query_datetime(self) -> None:
         """Test smartnoise-sql query on datetime"""
         with TestClient(app, headers=self.headers) as client:
             # Expect to work: query with datetimes and another user

From d991396cf9cc69e4190917442004a1bd708b42f9 Mon Sep 17 00:00:00 2001
From: PaulineMauryL <pauline.maury-laribiere@bfs.admin.ch>
Date: Wed, 11 Sep 2024 10:00:09 +0000
Subject: [PATCH 08/12] add requirements to server

---
 client/setup.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/client/setup.py b/client/setup.py
index bed412a5..364ebeeb 100644
--- a/client/setup.py
+++ b/client/setup.py
@@ -50,5 +50,7 @@
         "pandas>=2.2.2",
         "requests>=2.32.0",
         "scikit-learn==1.4.0",
+        "smartnoise-synth==1.0.4",
+        "smartnoise_synth_logger==0.0.3"
     ],
 )

From 701b0c75fb858f1ce390ed13b8267aa755599b19 Mon Sep 17 00:00:00 2001
From: PaulineMauryL <pauline.maury-laribiere@bfs.admin.ch>
Date: Wed, 11 Sep 2024 10:06:59 +0000
Subject: [PATCH 09/12] fix test with iloc

---
 server/lomas_server/tests/test_api.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/server/lomas_server/tests/test_api.py b/server/lomas_server/tests/test_api.py
index 3cf94b19..b4d29120 100644
--- a/server/lomas_server/tests/test_api.py
+++ b/server/lomas_server/tests/test_api.py
@@ -494,7 +494,7 @@ def test_smartnoise_sql_query_parameters(self) -> None:
             df_response = pd.DataFrame.from_dict(
                 response_dict["query_response"], orient="tight"
             )
-            assert df_response["avg_bill_length_mm"] > 0.0
+            assert df_response["avg_bill_length_mm"].iloc[0] > 0.0
 
             response_dict = json.loads(response.content.decode("utf8"))
             assert response_dict["requested_by"] == self.user_name
@@ -514,7 +514,7 @@ def test_smartnoise_sql_query_parameters(self) -> None:
             df_response = pd.DataFrame.from_dict(
                 response_dict["query_response"], orient="tight"
             )
-            assert df_response["avg_bill_length_mm"] > 0.0
+            assert df_response["avg_bill_length_mm"].iloc[0] > 0.0
 
             # Try postprocess False
             body["postprocess"] = False

From 9cb605203ca4f12163731631a4f77b33d8ca54ff Mon Sep 17 00:00:00 2001
From: PaulineMauryL <pauline.maury-laribiere@bfs.admin.ch>
Date: Wed, 11 Sep 2024 11:40:03 +0000
Subject: [PATCH 10/12] fix tests

---
 server/lomas_server/tests/test_api.py | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/server/lomas_server/tests/test_api.py b/server/lomas_server/tests/test_api.py
index b4d29120..b784a1fc 100644
--- a/server/lomas_server/tests/test_api.py
+++ b/server/lomas_server/tests/test_api.py
@@ -488,18 +488,12 @@ def test_smartnoise_sql_query_parameters(self) -> None:
             )
             assert response.status_code == status.HTTP_200_OK
             response_dict = json.loads(response.content.decode("utf8"))
-            assert response_dict["query_response"]["columns"] == [
-                "avg_bill_length_mm"
-            ]
             df_response = pd.DataFrame.from_dict(
                 response_dict["query_response"], orient="tight"
             )
             assert df_response["avg_bill_length_mm"].iloc[0] > 0.0
 
-            response_dict = json.loads(response.content.decode("utf8"))
-            assert response_dict["requested_by"] == self.user_name
-
-            # Change the mechaism
+            # Change the mechanism
             body["mechanisms"] = {"count": "gaussian", "sum_float": "laplace"}
             response = client.post(
                 "/smartnoise_sql_query",
@@ -508,9 +502,6 @@ def test_smartnoise_sql_query_parameters(self) -> None:
             )
             assert response.status_code == status.HTTP_200_OK
             response_dict = json.loads(response.content.decode("utf8"))
-            assert response_dict["query_response"]["columns"] == [
-                "avg_bill_length_mm"
-            ]
             df_response = pd.DataFrame.from_dict(
                 response_dict["query_response"], orient="tight"
             )
@@ -525,9 +516,6 @@ def test_smartnoise_sql_query_parameters(self) -> None:
             )
             assert response.status_code == status.HTTP_200_OK
             response_dict = json.loads(response.content.decode("utf8"))
-            assert response_dict["query_response"]["columns"] == [
-                "avg_bill_length_mm"
-            ]
             df_response = pd.DataFrame.from_dict(
                 response_dict["query_response"], orient="tight"
             )

From b2141acdfe924a04b792dd85600c157983c39e79 Mon Sep 17 00:00:00 2001
From: PaulineMauryL <pauline.maury-laribiere@bfs.admin.ch>
Date: Wed, 11 Sep 2024 11:48:03 +0000
Subject: [PATCH 11/12] update contributing for client install require

---
 CONTRIBUTING.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 63989f0c..c7f19ad8 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -70,7 +70,7 @@ The table below gives an overview of which workflows are triggered by what event
 
 Of these workflows, three of them need manual intervention to adjust the version number:
 
-* **Client library push**: The version must be set in `client/setup.py`
+* **Client library push**: The 'version' and the 'install_requires' must be set in `client/setup.py` ('install_requires' should match the list of library in requirements.txt).
 * **Helm chart push**: The chart version (`version`) and app version (`AppVersion`) of the server and the client must be updated in `server/deploy/helm/charts/lomas_server/Chart.yml`and `client/deploy/helm/charts/lomas_client/Chart.yaml`.
 * **Documentation push**: If a new version is released, it must be added to the `docs/versions.yaml` file. For more details on the generation of the documentation, please refer to `docs` and the `docs/build_docs.py` script.
 

From 9cf9e4cd318e591c9107d6595d10850e7958f0d1 Mon Sep 17 00:00:00 2001
From: PaulineMauryL <pauline.maury-laribiere@bfs.admin.ch>
Date: Wed, 11 Sep 2024 13:12:45 +0000
Subject: [PATCH 12/12] version 0.3.2

---
 client/deploy/helm/charts/lomas_client/Chart.yaml | 4 ++--
 client/setup.py                                   | 2 +-
 docs/versions.yaml                                | 4 ++--
 server/deploy/helm/charts/lomas_server/Chart.yaml | 4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/client/deploy/helm/charts/lomas_client/Chart.yaml b/client/deploy/helm/charts/lomas_client/Chart.yaml
index 2fa137fb..c3a9eebf 100644
--- a/client/deploy/helm/charts/lomas_client/Chart.yaml
+++ b/client/deploy/helm/charts/lomas_client/Chart.yaml
@@ -4,6 +4,6 @@ description: Lomas's Secure Data Disclosure deployment chart for the client envi
 
 type: application
 
-version: 0.3.1
+version: 0.3.2
 
-appVersion: "0.3.1"
+appVersion: "0.3.2"
diff --git a/client/setup.py b/client/setup.py
index 364ebeeb..bdf7fdc3 100644
--- a/client/setup.py
+++ b/client/setup.py
@@ -10,7 +10,7 @@
 setup(
     name="lomas_client",
     packages=find_packages(),
-    version="0.3.1",
+    version="0.3.2",
     description="A client to interact with the Lomas server.",
     long_description=long_description,
     long_description_content_type="text/markdown",
diff --git a/docs/versions.yaml b/docs/versions.yaml
index b4dae4f0..efe80cd2 100644
--- a/docs/versions.yaml
+++ b/docs/versions.yaml
@@ -26,7 +26,7 @@
    tag: "v0.3.0"
    languages:
     - "en"
-"v0.3.1":
-   tag: "v0.3.1"
+"v0.3.2":
+   tag: "v0.3.2"
    languages:
     - "en"
\ No newline at end of file
diff --git a/server/deploy/helm/charts/lomas_server/Chart.yaml b/server/deploy/helm/charts/lomas_server/Chart.yaml
index 17432f82..71aa5ffa 100644
--- a/server/deploy/helm/charts/lomas_server/Chart.yaml
+++ b/server/deploy/helm/charts/lomas_server/Chart.yaml
@@ -4,9 +4,9 @@ description: Lomas deployment chart
 
 type: application
 
-version: 0.3.1
+version: 0.3.2
 
-appVersion: "0.3.1"
+appVersion: "0.3.2"
 
 dependencies:
   - name: mongodb