Skip to content

Commit

Permalink
update lr examples for online demo
Browse files Browse the repository at this point in the history
  • Loading branch information
mh739025250 committed Oct 21, 2022
1 parent 32e8dd6 commit e4c84d7
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,11 @@
"There're several parts in the PPC Task that need to be programmed by the developer:\n",
"\n",
"* ***Task Config***: We can make some basis task config in the ```super().__init__()``` method. The configurations involves task name (```name```), minimum client count(```min_clients```), maximum client count(```max_clients```),waiting timeout for calculation (```wait_timeout```),and connection timeout for each step in the procedure(```connection_timeout```).\n",
"\n",
" And we can start the zero-knownledge proof step to verify the convergence of result and the consistence of data after the task is finished. To start the zero-knownledge proof step, we need to set `enable_verify` to True in `super().__init__()` method. And also, we could control the timeout of zero-knownledge proof step by parameter `verify_timeout`. Now the zero-knownledge proof step consumes pretty long time and the default value of `verify_timeout` is 300 second. If timeout error occurs in the the zero-knownledge proof step, we should set `verify_timeout` to a bigger value.\n",
"\n",
" ***We decide to disable the zero-knownledge proof step on online demo system due to the resouce restrictions. You should set `enable_verify` to False on online demo system.***\n",
"\n",
"* ***Dataset***: In the ```dataset``` method, you can specify the dataset for task. The return value is a dict of which key should be the name of dataset and value should be an instance of ```delta.dataset.DataFrame```; the key of dict should be corresponding to the parameters of the execute method. For detailed explanation of the dataset format, please refer to [this document](https://docs.deltampc.com/network-deployment/prepare-data).\n",
"* ***Preprocess***: In the ```preprocess```, you need to preprocess the dataset, and finally return the x and y for the task. The input parameters should be the same with the keys of returned dict of ```dataset``` method. The returned x and y can be ```pandas.DataFrame``` or ```numpy.ndarray```, and y should be a 1-D array of data labels.\n",
"* ***Options***: This method is optional. In the ```option``` method, you can specify some options for the logistic regression. The general options are ```method``` (fit method for logistic regression, only `newton` is available now) and `maxiter` (max iterations for fit). The newton method has some specific options, including `ord` (the norm ord for the gradient), `tol` (the stopping tolerance) and `ridge_factor` (the ridge regression factor for the hessian matrices). All these options have default values. You don't need to implment this method unless you have special needs.\n"
Expand All @@ -73,7 +78,7 @@
" wait_timeout=5, # Timeout for calculation.\n",
" connection_timeout=5, # Wait timeout for each step.\n",
" verify_timeout=500, # Timeout for the final zero knownledge verification step\n",
" enable_verify=True # whether to enable final zero knownledge verification step\n",
" enable_verify=False # whether to enable final zero knownledge verification step\n",
" )\n",
"\n",
" def dataset(self):\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,11 @@
"在定义横向联邦统计任务时,有几部分内容是需要用户自己定义的:\n",
"\n",
"* ***任务配置***: 我们需要在 ```super().__init__()``` 方法中对任务进行配置。 这些配置项包括任务名称(```name```),所需的最少客户端数(```min_clients```),最大客户端数(```max_clients```),等待超时时间(```wait_timeout```,用来控制一轮计算的超时时间),以及连接超时时间(```connection_timeout```,用来控制流程中每个阶段的超时时间)。\n",
"\n",
" 另外,逻辑回归任务还可以在任务完成后,开启零知识证明阶段,用于验证最终结果的收敛性,以及各个节点计算过程中数据的一致性。如果要开启零知识证明,需要将`super().__init__()`中的`enable_verify`参数设置为True。同时,可以通过`verify_timeout`参数来控制零知识证明阶段的超时时间。目前,零知识证明阶段耗时较长,`verify_timeout`的默认值为300秒,如果在零知识证明阶段发生超时,建议适当加大`verify_timeout`。\n",
"\n",
" ***目前线上演示系统由于资源限制,暂不支持开启零知识证明阶段。请将`enable_verify`参数设置为False***\n",
"\n",
"* ***数据集***: 我们需要在```dataset```方法中定义任务所需要的数据集。 该方法返回一个字典,键是数据集的名称,需要与execute方法的参数名对应;对应的值是```delta.dataset.DataFrame```实例, 其参数```dataset```代表所需数据集的名称。关于数据集格式的具体细节,请参考[这篇文章](https://docs.deltampc.com/network-deployment/prepare-data)。\n",
"* ***预处理***: 在预处理函数中,我们需要对数据集进行处理,最后返回x和y。 输入需要与```dataset```方法的返回值对应,即一个输入形参,对应```dataset```返回的字典中的一项。输出的x和y可以是`pandas.DataFrame`或`numpy.ndarray`,y必须是一个1维的向量,表示类别标签。\n",
"* ***选项配置***: 这个方法是可选的. 在`options`方法中,我们可以配置逻辑回归训练的一些参数。通用的参数包括 ```method```(逻辑回顾的训练方法,目前只有`newton`可选,即牛顿法)以及`maxiter`(训练的最大迭代次数)。还有一些牛顿法特有的参数, 包括`ord`(梯度范数的阶),`tol`(停止训练的容忍值)以及`ridge_factor`(对黑塞矩阵的脊回归系数)。上述所有的配置项,都有默认值。如果你没有特殊的需求,可以不实现这个方法。\n"
Expand All @@ -72,7 +77,7 @@
" wait_timeout=5, # 等待超时时间,用来控制一轮计算的超时时间\n",
" connection_timeout=5, # 连接超时时间,用来控制流程中每个阶段的超时时间\n",
" verify_timeout=500, # 零知识证明步骤的超时时间\n",
" enable_verify=True # 是否开启零知识证明\n",
" enable_verify=False # 是否开启零知识证明\n",
" )\n",
"\n",
" def dataset(self):\n",
Expand Down

0 comments on commit e4c84d7

Please sign in to comment.