From 4ff1f358ee9eaffeed7c8f3cfdf9fd88c01f10a4 Mon Sep 17 00:00:00 2001 From: Roman Bredehoft Date: Mon, 2 Oct 2023 17:37:49 +0200 Subject: [PATCH 1/2] docs: fix and improve credit scoring use case example --- .../credit_scoring/CreditScoring.ipynb | 698 ++++++++++-------- .../credit_scoring/requirements.txt | 6 +- 2 files changed, 378 insertions(+), 326 deletions(-) diff --git a/use_case_examples/credit_scoring/CreditScoring.ipynb b/use_case_examples/credit_scoring/CreditScoring.ipynb index c4ce77f6c..a9eb6cab9 100644 --- a/use_case_examples/credit_scoring/CreditScoring.ipynb +++ b/use_case_examples/credit_scoring/CreditScoring.ipynb @@ -4,50 +4,62 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Credit Scoring Model\n", + "# Credit Scoring in FHE\n", "\n", - "We develop and evaluate a model that predicts the chance of a given loan applicant defaulting on loan repayment while keeping the user's data private. Using a dataset from Kaggle (https://www.kaggle.com/code/ajay1735/my-credit-scoring-model/input), and borrowing some ideas from an existing notebook (https://www.kaggle.com/code/ajay1735/my-credit-scoring-model), we compare Scikit Learn models and Concrete ML models. " + "In this notebook, we build and evaluate a model that predicts the chance that a given loan applicant defaults on loan repayment while keeping the user's data private using Fully Homomorphic Encryption (FHE). It is strongly inspired from an [existing notebook](https://www.kaggle.com/code/ajay1735/my-credit-scoring-model) found on Kaggle, which uses the [Home Equity (HMEQ) dataset](https://www.kaggle.com/code/ajay1735/my-credit-scoring-model/input). In addition, we compare the performance between the original scikit-learn models and their Concrete ML equivalent. " ] }, { - "cell_type": "code", - "execution_count": 1, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "# Importing necessary libraries\n", - "import time\n", - "from functools import partial\n", - "\n", - "import numpy as np\n", - "import pandas as pd" + "### Import libraries" ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 1, "metadata": {}, "outputs": [], "source": [ - "# Importing the models, from both scikit-learn and Concrete ML\n", + "import time\n", + "\n", + "import pandas as pd\n", "from sklearn.ensemble import RandomForestClassifier as SklearnRandomForestClassifier\n", "from sklearn.linear_model import LogisticRegression as SklearnLogisticRegression\n", "from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.pipeline import Pipeline\n", "from sklearn.preprocessing import StandardScaler\n", + "\n", + "# Import models from scikit-learn and XGBoost\n", "from sklearn.tree import DecisionTreeClassifier as SklearnDecisionTreeClassifier\n", "from xgboost import XGBClassifier as SklearnXGBoostClassifier\n", "\n", + "# Import models from Concrete ML\n", "from concrete.ml.sklearn import DecisionTreeClassifier as ConcreteDecisionTreeClassifier\n", "from concrete.ml.sklearn import LogisticRegression as ConcreteLogisticRegression\n", "from concrete.ml.sklearn import RandomForestClassifier as ConcreteRandomForestClassifier\n", - "from concrete.ml.sklearn import XGBClassifier as ConcreteXGBoostClassifier" + "from concrete.ml.sklearn import XGBClassifier as ConcreteXGBoostClassifier\n", + "\n", + "CONCRETE_ML_MODELS = [\n", + " ConcreteDecisionTreeClassifier,\n", + " ConcreteLogisticRegression,\n", + " ConcreteRandomForestClassifier,\n", + " ConcreteXGBoostClassifier,\n", + "]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load the HMEQ dataset" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ @@ -59,32 +71,28 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Cleaning the dataset\n", - "\n", - "Details on data science aspects can be found in the original notebook https://www.kaggle.com/code/ajay1735/my-credit-scoring-model. We start with the best setting described in the linked notebook and focus on converting the model to FHE with Concrete ML." + "### Clean the dataset\n", + "Further details about dataset cleaning can be found the [original notebook](https://www.kaggle.com/code/ajay1735/my-credit-scoring-model)." ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 3, "metadata": {}, "outputs": [], "source": [ - "# Replacement of NaN variables\n", + "# Replace missing values\n", "df[\"REASON\"].fillna(value=\"DebtCon\", inplace=True)\n", "df[\"JOB\"].fillna(value=\"Other\", inplace=True)\n", "df[\"DEROG\"].fillna(value=0, inplace=True)\n", "df[\"DELINQ\"].fillna(value=0, inplace=True)\n", "\n", - "df.fillna(value=df.mean(), inplace=True)\n", - "\n", - "# Checking if there is anything left out\n", - "assert np.array_equal(df.isnull().sum(), [0] * len(df.isnull().sum()))" + "df.fillna(value=df.mean(numeric_only=True), inplace=True)" ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -224,23 +232,22 @@ "4 0.0 93.333333 0.000000 14.000000 33.779915 " ] }, - "execution_count": 5, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# How the dataset is\n", "df.head()" ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 5, "metadata": {}, "outputs": [], "source": [ - "# Removing the features BAD, JOB, REASON from the input features set\n", + "# Remove features BAD, JOB and REASON from the input feature set\n", "x_basic = df.drop(columns=[\"BAD\", \"JOB\", \"REASON\"])\n", "y = df[\"BAD\"]" ] @@ -249,235 +256,280 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Credit scoring task with Concrete ML" + "### Credit scoring with Concrete ML\n", + "In the following step, we first define the scikit-learn models found in the original notebook and build their FHE equivalent model using Concrete ML. Then, we evaluate and compare them side by side using several metrics (accuracy, F1 score, recall, precision). For Concrete ML models, their inference's execution time is also provided when done in FHE." ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 12, "metadata": {}, "outputs": [], "source": [ - "# pylint: disable=too-many-locals\n", - "\n", - "\n", "def evaluate(\n", - " model_class, name, x, y, test_size=0.33, show_circuit=False, predict_in_fhe=True, n_bits=None\n", + " model, x, y, test_size=0.33, show_circuit=False, predict_in_fhe=True, fhe_samples=None\n", "):\n", - " \"\"\"Function to evaluate a model class on a given (x, y). This returns different metrics, notably\n", - " in simulate and FHE for Concrete ML models, as well as execution times.\"\"\"\n", + " \"\"\"Evaluate the given model using several metrics.\n", + "\n", + " The model is evaluated using the following metrics: accuracy, F1 score, precision, recall.\n", + " For Concrete ML models, the inference's execution time is provided when done in FHE.\n", + "\n", + " Args:\n", + " model: The initialized model to consider.\n", + " x: The input data to consider.\n", + " y: The target data to consider.\n", + " test_size: The proportion to use for the test data. Default to 0.33.\n", + " show_circuit: If the FHE circuit should be printed for Concrete ML models. Default to False.\n", + " predict_in_fhe: If the inference should be executed in FHE for Concrete ML models. Else, it\n", + " will only be simulated.\n", + " fhe_sample: The number of samples to consider for evaluating the inference of Concrete ML\n", + " models if predict_in_fhe is set to True. If None, the complete test set is used. Default\n", + " to None.\n", + " \"\"\"\n", + " evaluation_result = {}\n", + "\n", + " is_concrete_ml = model.__class__ in CONCRETE_ML_MODELS\n", + "\n", + " name = model.__class__.__name__ + (\" (Concrete ML)\" if is_concrete_ml else \" (sklearn)\")\n", + "\n", + " evaluation_result[\"name\"] = name\n", "\n", - " print(f\"Evaluating {name}\")\n", + " print(f\"Evaluating model {name}\")\n", "\n", - " # Splitting the data into test and train sets. Remark the use of stratify, to make sure that\n", - " # the testset contains some representative class distribution in our targets\n", - " x_local_tr, x_local_te, y_local_tr, y_local_te = train_test_split(\n", + " # Split the data into test and train sets. Stratify is used to make sure that the test set\n", + " # contains some representative class distribution for targets\n", + " x_train, x_test, y_train, y_test = train_test_split(\n", " x, y, stratify=y, test_size=test_size, random_state=1\n", " )\n", - " len_x_local_te = len(x_local_te)\n", + " test_length = len(x_test)\n", "\n", - " # With a normalization\n", + " evaluation_result[\"Test samples\"] = test_length\n", + "\n", + " evaluation_result[\"n_bits\"] = model.n_bits if is_concrete_ml else None\n", + "\n", + " # Normalization pipeline\n", " model = Pipeline(\n", " [\n", " (\"preprocessor\", StandardScaler()),\n", - " (\"model\", model_class()),\n", + " (\"model\", model),\n", " ]\n", " )\n", "\n", - " # Training\n", - " model.fit(x_local_tr, y_local_tr)\n", + " # Train the model\n", + " model.fit(x_train, y_train)\n", + "\n", + " # Run the prediction and store its execution time\n", + " y_pred = model.predict(x_test)\n", "\n", - " # Predicting\n", - " before_time = time.time()\n", - " y_local_pre = model.predict(x_local_te)\n", - " local_t = (time.time() - before_time) / len_x_local_te\n", + " # Evaluate the model\n", + " # For Concrete ML models, this will execute the (quantized) inference in the clear\n", + " evaluation_result[\"Accuracy (clear)\"] = accuracy_score(y_test, y_pred)\n", + " evaluation_result[\"F1 (clear)\"] = f1_score(y_test, y_pred, average=\"macro\")\n", + " evaluation_result[\"Precision (clear)\"] = precision_score(y_test, y_pred, average=\"macro\")\n", + " evaluation_result[\"Recall (clear)\"] = recall_score(y_test, y_pred, average=\"macro\")\n", "\n", - " local_a = accuracy_score(y_local_te, y_local_pre)\n", - " local_f = f1_score(y_local_te, y_local_pre, average=\"macro\")\n", - " local_p = precision_score(y_local_te, y_local_pre, average=\"macro\")\n", - " local_r = recall_score(y_local_te, y_local_pre, average=\"macro\")\n", + " # If the model is from Concrete ML\n", + " if is_concrete_ml:\n", "\n", - " max_bit_width = None\n", - " local_a_simulate = None\n", - " local_a_fhe = None\n", - " local_t_simulate = None\n", - " local_t_fhe = None\n", + " print(\"Compile the model\")\n", "\n", - " # For Concrete ML models\n", - " if getattr(model_class(), \"_is_a_public_cml_model\", False):\n", - " circuit = model[\"model\"].compile(x) # pylint: disable=no-member\n", + " # Compile the model using the training data\n", + " circuit = model[\"model\"].compile(x_train) # pylint: disable=no-member\n", "\n", - " # To see the circuit\n", + " # Print the FHE circuit if needed\n", " if show_circuit:\n", " print(circuit)\n", "\n", - " # Max bitwidth of the circuit\n", - " max_bit_width = circuit.graph.maximum_integer_bit_width()\n", + " # Retrieve the circuit's max bit-width\n", + " evaluation_result[\"max bit-width\"] = circuit.graph.maximum_integer_bit_width()\n", "\n", - " # Prediction in simulation\n", - " before_time = time.time()\n", - " y_local_pre_simulate = model.predict(x_local_te, fhe=\"simulate\")\n", - " local_t_simulate = (time.time() - before_time) / len_x_local_te\n", + " print(\"Predict (simulated)\")\n", "\n", - " local_a_simulate = accuracy_score(y_local_te, y_local_pre_simulate)\n", + " # Run the prediction in the clear using FHE simulation, store its execution time and\n", + " # evaluate the accuracy score\n", + " y_pred_simulate = model.predict(x_test, fhe=\"simulate\")\n", "\n", - " # Prediction in FHE\n", + " evaluation_result[\"Accuracy (simulated)\"] = accuracy_score(y_test, y_pred_simulate)\n", + "\n", + " # Run the prediction in FHE, store its execution time and evaluate the accuracy score\n", " if predict_in_fhe:\n", + " if fhe_samples is not None:\n", + " x_test = x_test[0:fhe_samples]\n", + " y_test = y_test[0:fhe_samples]\n", + " test_length = fhe_samples\n", + "\n", + " evaluation_result[\"FHE samples\"] = test_length\n", + "\n", + " print(\"Predict (FHE)\")\n", + "\n", " before_time = time.time()\n", - " y_local_pre_fhe = model.predict(x_local_te, fhe=\"execute\")\n", - " local_t_fhe = (time.time() - before_time) / len_x_local_te\n", - "\n", - " local_a_fhe = accuracy_score(y_local_te, y_local_pre_fhe)\n", - "\n", - " ans = (\n", - " name,\n", - " local_a,\n", - " local_a_simulate,\n", - " local_a_fhe,\n", - " local_f,\n", - " local_p,\n", - " local_r,\n", - " max_bit_width,\n", - " local_t,\n", - " local_t_simulate,\n", - " local_t_fhe,\n", - " len_x_local_te,\n", - " n_bits,\n", - " )\n", + " y_pred_fhe = model.predict(x_test, fhe=\"execute\")\n", + " evaluation_result[\"FHE execution time (second per sample)\"] = (\n", + " time.time() - before_time\n", + " ) / test_length\n", + "\n", + " evaluation_result[\"Accuracy (FHE)\"] = accuracy_score(y_test, y_pred_fhe)\n", "\n", - " return ans" + " print(\"Done !\\n\")\n", + "\n", + " return evaluation_result" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run the evaluation\n", + "In the following, we evaluate several types of classifiers : logistic regression, decision tree, random forest and XGBoost." ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Evaluating SklearnLogisticRegression\n", - "Evaluating ConcreteLogisticRegression\n", - "Evaluating SklearnXGBoostClassifier\n", - "Evaluating ConcreteXGBoostClassifier\n", - "Evaluating ConcreteXGBoostClassifier\n" + "Evaluating model LogisticRegression (sklearn)\n", + "Done !\n", + "\n", + "Evaluating model LogisticRegression (Concrete ML)\n", + "Compile the model\n", + "Predict (simulated)\n", + "Predict (FHE)\n", + "Done !\n", + "\n", + "Evaluating model DecisionTreeClassifier (sklearn)\n", + "Done !\n", + "\n", + "Evaluating model DecisionTreeClassifier (Concrete ML)\n", + "Compile the model\n", + "Predict (simulated)\n", + "Predict (FHE)\n", + "Done !\n", + "\n", + "Evaluating model RandomForestClassifier (sklearn)\n", + "Done !\n", + "\n", + "Evaluating model RandomForestClassifier (Concrete ML)\n", + "Compile the model\n", + "Predict (simulated)\n", + "Predict (FHE)\n", + "Done !\n", + "\n", + "Evaluating model XGBClassifier (sklearn)\n", + "Done !\n", + "\n", + "Evaluating model XGBClassifier (Concrete ML)\n", + "Compile the model\n", + "Predict (simulated)\n", + "Predict (FHE)\n", + "Done !\n", + "\n" ] } ], "source": [ - "list_of_results = []\n", + "results = []\n", + "\n", + "# Define the test size proportion\n", + "test_size = 0.2\n", "\n", - "# For fast models, take a large test_size\n", - "test_size = 0.33\n", + "# For testing FHE execution locally, define the number of inference to run. If None, the complete\n", + "# test set is used\n", + "fhe_samples = None\n", "\n", "# Logistic regression\n", - "list_of_results += evaluate(\n", - " SklearnLogisticRegression, \"SklearnLogisticRegression\", x_basic, y, test_size=test_size\n", - ")\n", - "list_of_results += evaluate(\n", - " ConcreteLogisticRegression, \"ConcreteLogisticRegression\", x_basic, y, test_size=test_size\n", - ")\n", - "\n", - "# If you want, make it smaller, to avoid to have too long execution\n", - "test_size_short = test_size\n", - "\n", - "# Options to tree-based models\n", - "n_bits = 3\n", - "\n", - "extra_flags_dt = {\"max_depth\": 10}\n", - "extra_flags_rf = {\"max_depth\": 7, \"n_estimators\": 5}\n", - "extra_flags_xgb = {\"max_depth\": 7, \"n_estimators\": 5}\n", - "extra_flags_cml = {\"n_bits\": n_bits}\n", - "\n", - "# Options\n", - "use_dt = False\n", - "use_rf = False\n", + "results.append(evaluate(SklearnLogisticRegression(), x_basic, y, test_size=test_size))\n", + "results.append(evaluate(ConcreteLogisticRegression(), x_basic, y, test_size=test_size))\n", + "\n", + "# Define the initialization parameters for tree-based models\n", + "init_params_dt = {\"max_depth\": 10}\n", + "init_params_rf = {\"max_depth\": 7, \"n_estimators\": 5}\n", + "init_params_xgb = {\"max_depth\": 7, \"n_estimators\": 5}\n", + "init_params_cml = {\"n_bits\": 3}\n", + "\n", + "# Determine the type of models to evaluate\n", + "use_dt = True\n", + "use_rf = True\n", "use_xgb = True\n", - "use_full_dataset_for_cml_models = True\n", + "predict_in_fhe = True\n", "\n", - "# Decision tree\n", + "# Decision tree models\n", "if use_dt:\n", - " list_of_results += evaluate(\n", - " partial(SklearnDecisionTreeClassifier, **extra_flags_dt),\n", - " \"SklearnDecisionTreeClassifier\",\n", - " x_basic,\n", - " y,\n", - " test_size=test_size,\n", + "\n", + " # Scikit-Learn model\n", + " results.append(\n", + " evaluate(\n", + " SklearnDecisionTreeClassifier(**init_params_dt),\n", + " x_basic,\n", + " y,\n", + " test_size=test_size,\n", + " )\n", " )\n", - " if use_full_dataset_for_cml_models:\n", - " list_of_results += evaluate(\n", - " partial(ConcreteDecisionTreeClassifier, **extra_flags_dt, **extra_flags_cml),\n", - " \"ConcreteDecisionTreeClassifier\",\n", + "\n", + " # Concrete ML model\n", + " results.append(\n", + " evaluate(\n", + " ConcreteDecisionTreeClassifier(**init_params_dt, **init_params_cml),\n", " x_basic,\n", " y,\n", " test_size=test_size,\n", - " n_bits=n_bits,\n", - " predict_in_fhe=False,\n", + " predict_in_fhe=predict_in_fhe,\n", + " fhe_samples=fhe_samples,\n", " )\n", - " list_of_results += evaluate(\n", - " partial(ConcreteDecisionTreeClassifier, **extra_flags_dt, **extra_flags_cml),\n", - " \"ConcreteDecisionTreeClassifier\",\n", - " x_basic,\n", - " y,\n", - " test_size=test_size_short,\n", - " n_bits=n_bits,\n", " )\n", "\n", "# Random Forest\n", "if use_rf:\n", - " list_of_results += evaluate(\n", - " partial(SklearnRandomForestClassifier, **extra_flags_rf),\n", - " \"SklearnRandomForestClassifier\",\n", - " x_basic,\n", - " y,\n", - " test_size=test_size,\n", + "\n", + " # Scikit-Learn model\n", + " results.append(\n", + " evaluate(\n", + " SklearnRandomForestClassifier(**init_params_rf),\n", + " x_basic,\n", + " y,\n", + " test_size=test_size,\n", + " )\n", " )\n", - " if use_full_dataset_for_cml_models:\n", - " list_of_results += evaluate(\n", - " partial(ConcreteRandomForestClassifier, **extra_flags_rf, **extra_flags_cml),\n", - " \"ConcreteRandomForestClassifier\",\n", + "\n", + " # Concrete ML model\n", + " results.append(\n", + " evaluate(\n", + " ConcreteRandomForestClassifier(**init_params_rf, **init_params_cml),\n", " x_basic,\n", " y,\n", " test_size=test_size,\n", - " n_bits=n_bits,\n", - " predict_in_fhe=False,\n", + " predict_in_fhe=predict_in_fhe,\n", + " fhe_samples=fhe_samples,\n", " )\n", - " list_of_results += evaluate(\n", - " partial(ConcreteRandomForestClassifier, **extra_flags_rf, **extra_flags_cml),\n", - " \"ConcreteRandomForestClassifier\",\n", - " x_basic,\n", - " y,\n", - " test_size=test_size_short,\n", - " n_bits=n_bits,\n", " )\n", "\n", "# XGBoost\n", "if use_xgb:\n", - " list_of_results += evaluate(\n", - " partial(SklearnXGBoostClassifier, **extra_flags_xgb),\n", - " \"SklearnXGBoostClassifier\",\n", - " x_basic,\n", - " y,\n", - " test_size=test_size,\n", + "\n", + " # Scikit-Learn model\n", + " results.append(\n", + " evaluate(\n", + " SklearnXGBoostClassifier(**init_params_xgb),\n", + " x_basic,\n", + " y,\n", + " test_size=test_size,\n", + " )\n", " )\n", - " if use_full_dataset_for_cml_models:\n", - " list_of_results += evaluate(\n", - " partial(ConcreteXGBoostClassifier, **extra_flags_xgb, **extra_flags_cml),\n", - " \"ConcreteXGBoostClassifier\",\n", + "\n", + " # Concrete ML model\n", + " results.append(\n", + " evaluate(\n", + " ConcreteXGBoostClassifier(**init_params_xgb, **init_params_cml),\n", " x_basic,\n", " y,\n", " test_size=test_size,\n", - " n_bits=n_bits,\n", - " predict_in_fhe=False,\n", + " predict_in_fhe=predict_in_fhe,\n", + " fhe_samples=fhe_samples,\n", " )\n", - " list_of_results += evaluate(\n", - " partial(ConcreteXGBoostClassifier, **extra_flags_xgb, **extra_flags_cml),\n", - " \"ConcreteXGBoostClassifier\",\n", - " x_basic,\n", - " y,\n", - " test_size=test_size_short,\n", - " n_bits=n_bits,\n", " )" ] }, @@ -485,59 +537,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Comparing all the models" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "# Extract information from list_of_results\n", - "size_of_info = 13\n", - "\n", - "model_names = list_of_results[0::size_of_info]\n", - "\n", - "accuracies = list_of_results[1::size_of_info]\n", - "accuracies_simulate = list_of_results[2::size_of_info]\n", - "accuracies_fhe = list_of_results[3::size_of_info]\n", - "\n", - "recalls = list_of_results[4::size_of_info]\n", - "f1s = list_of_results[5::size_of_info]\n", - "precisions = list_of_results[6::size_of_info]\n", - "max_bit_widths = list_of_results[7::size_of_info]\n", - "\n", - "t = list_of_results[8::size_of_info]\n", - "t_simulate = list_of_results[9::size_of_info]\n", - "t_fhe = list_of_results[10::size_of_info]\n", - "\n", - "length_dataset = list_of_results[11::size_of_info]\n", - "n_bits = list_of_results[12::size_of_info]\n", - "\n", - "# And make a nice table\n", - "results_dataframe = pd.DataFrame(\n", - " {\n", - " \"Model name\": model_names,\n", - " \"Quantization (bits)\": n_bits,\n", - " \"Len of the dataset\": length_dataset,\n", - " \"Accuracy Score (original)\": accuracies,\n", - " \"Accuracy Score (simulate)\": accuracies_simulate,\n", - " \"Accuracy Score (FHE)\": accuracies_fhe,\n", - " \"Recall Score\": recalls,\n", - " \"F1 Score\": f1s,\n", - " \"Precision Score\": precisions,\n", - " \"Max bitwidth\": max_bit_widths,\n", - " \"Execution time (original, in seconds)\": t,\n", - " \"Execution time (simulate, in seconds)\": t_simulate,\n", - " \"Execution time (FHE, in seconds)\": t_fhe,\n", - " }\n", - ")" + "### Compare the models\n", + "\n", + "Let's compare the models' performance in a pandas Dataframe. We can see that with only a few bits of quantization, the Concrete models perform as well as their scikit-learn equivalent. More precisely, the small differences that can be observed are only the result of quantization: running the inference in FHE does not impact the accuracy score." ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 14, "metadata": {}, "outputs": [ { @@ -561,152 +568,197 @@ " \n", " \n", " \n", - " Model name\n", - " Quantization (bits)\n", - " Len of the dataset\n", - " Accuracy Score (original)\n", - " Accuracy Score (simulate)\n", - " Accuracy Score (FHE)\n", - " Recall Score\n", - " F1 Score\n", - " Precision Score\n", - " Max bitwidth\n", - " Execution time (original, in seconds)\n", - " Execution time (simulate, in seconds)\n", - " Execution time (FHE, in seconds)\n", + " name\n", + " Test samples\n", + " n_bits\n", + " Accuracy (clear)\n", + " F1 (clear)\n", + " Precision (clear)\n", + " Recall (clear)\n", + " max bit-width\n", + " Accuracy (simulated)\n", + " FHE samples\n", + " FHE execution time (second per sample)\n", + " Accuracy (FHE)\n", " \n", " \n", " \n", " \n", " 0\n", - " SklearnLogisticRegression\n", + " LogisticRegression (sklearn)\n", + " 1192\n", " \n", - " 1967\n", - " 0.830\n", + " 0.824\n", + " 0.627\n", + " 0.748\n", + " 0.606\n", " \n", " \n", - " 0.645\n", - " 0.762\n", - " 0.621\n", " \n", - " 1.018e-06\n", " \n", " \n", " \n", " \n", " 1\n", - " ConcreteLogisticRegression\n", - " \n", - " 1967\n", - " 0.830\n", - " 0.83\n", - " 0.83\n", - " 0.645\n", - " 0.765\n", - " 0.620\n", + " LogisticRegression (Concrete ML)\n", + " 1192\n", + " 8.0\n", + " 0.824\n", + " 0.627\n", + " 0.748\n", + " 0.606\n", " 18.0\n", - " 1.469e-06\n", + " 0.824\n", + " 1192.0\n", " 0.0\n", - " 0.001\n", + " 0.824\n", " \n", " \n", " 2\n", - " SklearnXGBoostClassifier\n", + " DecisionTreeClassifier (sklearn)\n", + " 1192\n", " \n", - " 1967\n", - " 0.883\n", + " 0.881\n", + " 0.788\n", + " 0.842\n", + " 0.757\n", " \n", " \n", - " 0.805\n", - " 0.828\n", - " 0.788\n", " \n", - " 2.694e-06\n", " \n", " \n", " \n", " \n", " 3\n", - " ConcreteXGBoostClassifier\n", + " DecisionTreeClassifier (Concrete ML)\n", + " 1192\n", " 3.0\n", - " 1967\n", - " 0.840\n", - " 0.84\n", + " 0.852\n", + " 0.705\n", + " 0.818\n", + " 0.670\n", + " 4.0\n", + " 0.848\n", + " 5.0\n", + " 0.91\n", + " 0.8\n", + " \n", + " \n", + " 4\n", + " RandomForestClassifier (sklearn)\n", + " 1192\n", " \n", - " 0.649\n", - " 0.825\n", - " 0.621\n", + " 0.874\n", + " 0.757\n", + " 0.863\n", + " 0.715\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " 5\n", + " RandomForestClassifier (Concrete ML)\n", + " 1192\n", + " 3.0\n", + " 0.841\n", + " 0.629\n", + " 0.890\n", + " 0.606\n", " 4.0\n", - " 2.003e-05\n", - " 0.001\n", + " 0.841\n", + " 5.0\n", + " 1.557\n", + " 0.6\n", + " \n", + " \n", + " 6\n", + " XGBClassifier (sklearn)\n", + " 1192\n", + " \n", + " 0.888\n", + " 0.806\n", + " 0.846\n", + " 0.780\n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", " \n", - " 4\n", - " ConcreteXGBoostClassifier\n", + " 7\n", + " XGBClassifier (Concrete ML)\n", + " 1192\n", " 3.0\n", - " 1967\n", - " 0.840\n", - " 0.84\n", - " 0.84\n", - " 0.649\n", - " 0.825\n", - " 0.621\n", + " 0.841\n", + " 0.647\n", + " 0.848\n", + " 0.619\n", " 4.0\n", - " 1.638e-05\n", - " 0.001\n", - " 0.225\n", + " 0.841\n", + " 5.0\n", + " 1.414\n", + " 0.8\n", " \n", " \n", "\n", "" ], "text/plain": [ - " Model name Quantization (bits) Len of the dataset \\\n", - "0 SklearnLogisticRegression 1967 \n", - "1 ConcreteLogisticRegression 1967 \n", - "2 SklearnXGBoostClassifier 1967 \n", - "3 ConcreteXGBoostClassifier 3.0 1967 \n", - "4 ConcreteXGBoostClassifier 3.0 1967 \n", - "\n", - " Accuracy Score (original) Accuracy Score (simulate) Accuracy Score (FHE) \\\n", - "0 0.830 \n", - "1 0.830 0.83 0.83 \n", - "2 0.883 \n", - "3 0.840 0.84 \n", - "4 0.840 0.84 0.84 \n", + " name Test samples n_bits \\\n", + "0 LogisticRegression (sklearn) 1192 \n", + "1 LogisticRegression (Concrete ML) 1192 8.0 \n", + "2 DecisionTreeClassifier (sklearn) 1192 \n", + "3 DecisionTreeClassifier (Concrete ML) 1192 3.0 \n", + "4 RandomForestClassifier (sklearn) 1192 \n", + "5 RandomForestClassifier (Concrete ML) 1192 3.0 \n", + "6 XGBClassifier (sklearn) 1192 \n", + "7 XGBClassifier (Concrete ML) 1192 3.0 \n", "\n", - " Recall Score F1 Score Precision Score Max bitwidth \\\n", - "0 0.645 0.762 0.621 \n", - "1 0.645 0.765 0.620 18.0 \n", - "2 0.805 0.828 0.788 \n", - "3 0.649 0.825 0.621 4.0 \n", - "4 0.649 0.825 0.621 4.0 \n", + " Accuracy (clear) F1 (clear) Precision (clear) Recall (clear) \\\n", + "0 0.824 0.627 0.748 0.606 \n", + "1 0.824 0.627 0.748 0.606 \n", + "2 0.881 0.788 0.842 0.757 \n", + "3 0.852 0.705 0.818 0.670 \n", + "4 0.874 0.757 0.863 0.715 \n", + "5 0.841 0.629 0.890 0.606 \n", + "6 0.888 0.806 0.846 0.780 \n", + "7 0.841 0.647 0.848 0.619 \n", "\n", - " Execution time (original, in seconds) \\\n", - "0 1.018e-06 \n", - "1 1.469e-06 \n", - "2 2.694e-06 \n", - "3 2.003e-05 \n", - "4 1.638e-05 \n", + " max bit-width Accuracy (simulated) FHE samples \\\n", + "0 \n", + "1 18.0 0.824 1192.0 \n", + "2 \n", + "3 4.0 0.848 5.0 \n", + "4 \n", + "5 4.0 0.841 5.0 \n", + "6 \n", + "7 4.0 0.841 5.0 \n", "\n", - " Execution time (simulate, in seconds) Execution time (FHE, in seconds) \n", - "0 \n", - "1 0.0 0.001 \n", - "2 \n", - "3 0.001 \n", - "4 0.001 0.225 " + " FHE execution time (second per sample) Accuracy (FHE) \n", + "0 \n", + "1 0.0 0.824 \n", + "2 \n", + "3 0.91 0.8 \n", + "4 \n", + "5 1.557 0.6 \n", + "6 \n", + "7 1.414 0.8 " ] }, - "execution_count": 10, + "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.set_option(\"display.precision\", 3)\n", - "results_dataframe = results_dataframe.fillna(\"\")\n", - "results_dataframe # pylint: disable=W0104" + "\n", + "results_dataframe = pd.DataFrame(results)\n", + "results_dataframe.fillna(\"\")" ] } ], diff --git a/use_case_examples/credit_scoring/requirements.txt b/use_case_examples/credit_scoring/requirements.txt index 0a4506874..56b78ae9f 100644 --- a/use_case_examples/credit_scoring/requirements.txt +++ b/use_case_examples/credit_scoring/requirements.txt @@ -1,3 +1,3 @@ -concrete-ml==1.0.0 -jupyter==1.0.0 -graphviz==0.20.1 \ No newline at end of file +concrete-ml +jupyter +pandas From e31d503ee03e0e1cdcf07a149e2e3eac5f9c96ce Mon Sep 17 00:00:00 2001 From: RomanBredehoft Date: Tue, 3 Oct 2023 11:12:38 +0200 Subject: [PATCH 2/2] chore: refresh CreditScoring notebook --- .../credit_scoring/CreditScoring.ipynb | 74 +++++++++---------- 1 file changed, 37 insertions(+), 37 deletions(-) diff --git a/use_case_examples/credit_scoring/CreditScoring.ipynb b/use_case_examples/credit_scoring/CreditScoring.ipynb index a9eb6cab9..ef9ee1d5f 100644 --- a/use_case_examples/credit_scoring/CreditScoring.ipynb +++ b/use_case_examples/credit_scoring/CreditScoring.ipynb @@ -262,7 +262,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 6, "metadata": {}, "outputs": [], "source": [ @@ -385,7 +385,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 7, "metadata": {}, "outputs": [ { @@ -544,7 +544,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 8, "metadata": {}, "outputs": [ { @@ -610,7 +610,7 @@ " 18.0\n", " 0.824\n", " 1192.0\n", - " 0.0\n", + " 0.001\n", " 0.824\n", " \n", " \n", @@ -618,10 +618,10 @@ " DecisionTreeClassifier (sklearn)\n", " 1192\n", " \n", - " 0.881\n", - " 0.788\n", - " 0.842\n", - " 0.757\n", + " 0.879\n", + " 0.783\n", + " 0.843\n", + " 0.750\n", " \n", " \n", " \n", @@ -639,19 +639,19 @@ " 0.670\n", " 4.0\n", " 0.848\n", - " 5.0\n", - " 0.91\n", - " 0.8\n", + " 1192.0\n", + " 0.194\n", + " 0.848\n", " \n", " \n", " 4\n", " RandomForestClassifier (sklearn)\n", " 1192\n", " \n", - " 0.874\n", - " 0.757\n", - " 0.863\n", - " 0.715\n", + " 0.872\n", + " 0.761\n", + " 0.839\n", + " 0.724\n", " \n", " \n", " \n", @@ -663,15 +663,15 @@ " RandomForestClassifier (Concrete ML)\n", " 1192\n", " 3.0\n", - " 0.841\n", - " 0.629\n", - " 0.890\n", - " 0.606\n", + " 0.840\n", + " 0.645\n", + " 0.836\n", + " 0.618\n", " 4.0\n", - " 0.841\n", - " 5.0\n", - " 1.557\n", - " 0.6\n", + " 0.84\n", + " 1192.0\n", + " 0.295\n", + " 0.84\n", " \n", " \n", " 6\n", @@ -699,9 +699,9 @@ " 0.619\n", " 4.0\n", " 0.841\n", - " 5.0\n", - " 1.414\n", - " 0.8\n", + " 1192.0\n", + " 0.226\n", + " 0.841\n", " \n", " \n", "\n", @@ -721,10 +721,10 @@ " Accuracy (clear) F1 (clear) Precision (clear) Recall (clear) \\\n", "0 0.824 0.627 0.748 0.606 \n", "1 0.824 0.627 0.748 0.606 \n", - "2 0.881 0.788 0.842 0.757 \n", + "2 0.879 0.783 0.843 0.750 \n", "3 0.852 0.705 0.818 0.670 \n", - "4 0.874 0.757 0.863 0.715 \n", - "5 0.841 0.629 0.890 0.606 \n", + "4 0.872 0.761 0.839 0.724 \n", + "5 0.840 0.645 0.836 0.618 \n", "6 0.888 0.806 0.846 0.780 \n", "7 0.841 0.647 0.848 0.619 \n", "\n", @@ -732,24 +732,24 @@ "0 \n", "1 18.0 0.824 1192.0 \n", "2 \n", - "3 4.0 0.848 5.0 \n", + "3 4.0 0.848 1192.0 \n", "4 \n", - "5 4.0 0.841 5.0 \n", + "5 4.0 0.84 1192.0 \n", "6 \n", - "7 4.0 0.841 5.0 \n", + "7 4.0 0.841 1192.0 \n", "\n", " FHE execution time (second per sample) Accuracy (FHE) \n", "0 \n", - "1 0.0 0.824 \n", + "1 0.001 0.824 \n", "2 \n", - "3 0.91 0.8 \n", + "3 0.194 0.848 \n", "4 \n", - "5 1.557 0.6 \n", + "5 0.295 0.84 \n", "6 \n", - "7 1.414 0.8 " + "7 0.226 0.841 " ] }, - "execution_count": 14, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" }