diff --git a/jupyter/examples/en-horizontal-logistic-regression-task.ipynb b/jupyter/examples/en-horizontal-logistic-regression-task.ipynb index 1c6dc3b..f38bd56 100644 --- a/jupyter/examples/en-horizontal-logistic-regression-task.ipynb +++ b/jupyter/examples/en-horizontal-logistic-regression-task.ipynb @@ -50,6 +50,11 @@ "There're several parts in the PPC Task that need to be programmed by the developer:\n", "\n", "* ***Task Config***: We can make some basis task config in the ```super().__init__()``` method. The configurations involves task name (```name```), minimum client count(```min_clients```), maximum client count(```max_clients```),waiting timeout for calculation (```wait_timeout```),and connection timeout for each step in the procedure(```connection_timeout```).\n", + "\n", + " And we can start the zero-knownledge proof step to verify the convergence of result and the consistence of data after the task is finished. To start the zero-knownledge proof step, we need to set `enable_verify` to True in `super().__init__()` method. And also, we could control the timeout of zero-knownledge proof step by parameter `verify_timeout`. Now the zero-knownledge proof step consumes pretty long time and the default value of `verify_timeout` is 300 second. If timeout error occurs in the the zero-knownledge proof step, we should set `verify_timeout` to a bigger value.\n", + "\n", + " ***We decide to disable the zero-knownledge proof step on online demo system due to the resouce restrictions. You should set `enable_verify` to False on online demo system.***\n", + "\n", "* ***Dataset***: In the ```dataset``` method, you can specify the dataset for task. The return value is a dict of which key should be the name of dataset and value should be an instance of ```delta.dataset.DataFrame```; the key of dict should be corresponding to the parameters of the execute method. For detailed explanation of the dataset format, please refer to [this document](https://docs.deltampc.com/network-deployment/prepare-data).\n", "* ***Preprocess***: In the ```preprocess```, you need to preprocess the dataset, and finally return the x and y for the task. The input parameters should be the same with the keys of returned dict of ```dataset``` method. The returned x and y can be ```pandas.DataFrame``` or ```numpy.ndarray```, and y should be a 1-D array of data labels.\n", "* ***Options***: This method is optional. In the ```option``` method, you can specify some options for the logistic regression. The general options are ```method``` (fit method for logistic regression, only `newton` is available now) and `maxiter` (max iterations for fit). The newton method has some specific options, including `ord` (the norm ord for the gradient), `tol` (the stopping tolerance) and `ridge_factor` (the ridge regression factor for the hessian matrices). All these options have default values. You don't need to implment this method unless you have special needs.\n" @@ -73,7 +78,7 @@ " wait_timeout=5, # Timeout for calculation.\n", " connection_timeout=5, # Wait timeout for each step.\n", " verify_timeout=500, # Timeout for the final zero knownledge verification step\n", - " enable_verify=True # whether to enable final zero knownledge verification step\n", + " enable_verify=False # whether to enable final zero knownledge verification step\n", " )\n", "\n", " def dataset(self):\n", diff --git a/jupyter/examples/zh-horizontal-logistic-regression-task.ipynb b/jupyter/examples/zh-horizontal-logistic-regression-task.ipynb index a79f802..a290c38 100644 --- a/jupyter/examples/zh-horizontal-logistic-regression-task.ipynb +++ b/jupyter/examples/zh-horizontal-logistic-regression-task.ipynb @@ -49,6 +49,11 @@ "在定义横向联邦统计任务时,有几部分内容是需要用户自己定义的:\n", "\n", "* ***任务配置***: 我们需要在 ```super().__init__()``` 方法中对任务进行配置。 这些配置项包括任务名称(```name```),所需的最少客户端数(```min_clients```),最大客户端数(```max_clients```),等待超时时间(```wait_timeout```,用来控制一轮计算的超时时间),以及连接超时时间(```connection_timeout```,用来控制流程中每个阶段的超时时间)。\n", + "\n", + " 另外,逻辑回归任务还可以在任务完成后,开启零知识证明阶段,用于验证最终结果的收敛性,以及各个节点计算过程中数据的一致性。如果要开启零知识证明,需要将`super().__init__()`中的`enable_verify`参数设置为True。同时,可以通过`verify_timeout`参数来控制零知识证明阶段的超时时间。目前,零知识证明阶段耗时较长,`verify_timeout`的默认值为300秒,如果在零知识证明阶段发生超时,建议适当加大`verify_timeout`。\n", + "\n", + " ***目前线上演示系统由于资源限制,暂不支持开启零知识证明阶段。请将`enable_verify`参数设置为False***\n", + "\n", "* ***数据集***: 我们需要在```dataset```方法中定义任务所需要的数据集。 该方法返回一个字典,键是数据集的名称,需要与execute方法的参数名对应;对应的值是```delta.dataset.DataFrame```实例, 其参数```dataset```代表所需数据集的名称。关于数据集格式的具体细节,请参考[这篇文章](https://docs.deltampc.com/network-deployment/prepare-data)。\n", "* ***预处理***: 在预处理函数中,我们需要对数据集进行处理,最后返回x和y。 输入需要与```dataset```方法的返回值对应,即一个输入形参,对应```dataset```返回的字典中的一项。输出的x和y可以是`pandas.DataFrame`或`numpy.ndarray`,y必须是一个1维的向量,表示类别标签。\n", "* ***选项配置***: 这个方法是可选的. 在`options`方法中,我们可以配置逻辑回归训练的一些参数。通用的参数包括 ```method```(逻辑回顾的训练方法,目前只有`newton`可选,即牛顿法)以及`maxiter`(训练的最大迭代次数)。还有一些牛顿法特有的参数, 包括`ord`(梯度范数的阶),`tol`(停止训练的容忍值)以及`ridge_factor`(对黑塞矩阵的脊回归系数)。上述所有的配置项,都有默认值。如果你没有特殊的需求,可以不实现这个方法。\n" @@ -72,7 +77,7 @@ " wait_timeout=5, # 等待超时时间,用来控制一轮计算的超时时间\n", " connection_timeout=5, # 连接超时时间,用来控制流程中每个阶段的超时时间\n", " verify_timeout=500, # 零知识证明步骤的超时时间\n", - " enable_verify=True # 是否开启零知识证明\n", + " enable_verify=False # 是否开启零知识证明\n", " )\n", "\n", " def dataset(self):\n",