diff --git a/source/sklearn-metadata.json b/source/sklearn-metadata.json index 663e0a5d2f..4a0151ece6 100644 --- a/source/sklearn-metadata.json +++ b/source/sklearn-metadata.json @@ -192,7 +192,7 @@ }, { "name": "sklearn.calibration.CalibratedClassifierCV", - "description": "Probability calibration with isotonic regression or logistic regression.\n\nThis class uses cross-validation to both estimate the parameters of a\nclassifier and subsequently calibrate a classifier. With default\n`ensemble=True`, for each cv split it\nfits a copy of the base estimator to the training subset, and calibrates it\nusing the testing subset. For prediction, predicted probabilities are\naveraged across these individual calibrated classifiers. When\n`ensemble=False`, cross-validation is used to obtain unbiased predictions,\nvia :func:`~sklearn.model_selection.cross_val_predict`, which are then\nused for calibration. For prediction, the base estimator, trained using all\nthe data, is used. This is the prediction method implemented when\n`probabilities=True` for :class:`~sklearn.svm.SVC` and :class:`~sklearn.svm.NuSVC`\nestimators (see :ref:`User Guide ` for details).\n\nAlready fitted classifiers can be calibrated via the parameter\n`cv=\"prefit\"`. In this case, no cross-validation is used and all provided\ndata is used for calibration. The user has to take care manually that data\nfor model fitting and calibration are disjoint.\n\nThe calibration is based on the :term:`decision_function` method of the\n`estimator` if it exists, else on :term:`predict_proba`.\n\nRead more in the :ref:`User Guide `.\nIn order to learn more on the CalibratedClassifierCV class, see the\nfollowing calibration examples:\n:ref:`sphx_glr_auto_examples_calibration_plot_calibration.py`,\n:ref:`sphx_glr_auto_examples_calibration_plot_calibration_curve.py`, and\n:ref:`sphx_glr_auto_examples_calibration_plot_calibration_multiclass.py`.\n", + "description": "Probability calibration with isotonic regression or logistic regression.\n\nThis class uses cross-validation to both estimate the parameters of a\nclassifier and subsequently calibrate a classifier. With default\n`ensemble=True`, for each cv split it\nfits a copy of the base estimator to the training subset, and calibrates it\nusing the testing subset. For prediction, predicted probabilities are\naveraged across these individual calibrated classifiers. When\n`ensemble=False`, cross-validation is used to obtain unbiased predictions,\nvia :func:`~sklearn.model_selection.cross_val_predict`, which are then\nused for calibration. For prediction, the base estimator, trained using all\nthe data, is used. This is the prediction method implemented when\n`probabilities=True` for :class:`~sklearn.svm.SVC` and :class:`~sklearn.svm.NuSVC`\nestimators (see :ref:`User Guide ` for details).\n\nAlready fitted classifiers can be calibrated by wrapping the model in a\n:class:`~sklearn.frozen.FrozenEstimator`. In this case all provided\ndata is used for calibration. The user has to take care manually that data\nfor model fitting and calibration are disjoint.\n\nThe calibration is based on the :term:`decision_function` method of the\n`estimator` if it exists, else on :term:`predict_proba`.\n\nRead more in the :ref:`User Guide `.\nIn order to learn more on the CalibratedClassifierCV class, see the\nfollowing calibration examples:\n:ref:`sphx_glr_auto_examples_calibration_plot_calibration.py`,\n:ref:`sphx_glr_auto_examples_calibration_plot_calibration_curve.py`, and\n:ref:`sphx_glr_auto_examples_calibration_plot_calibration_multiclass.py`.\n", "attributes": [ { "default": null, @@ -206,7 +206,7 @@ }, { "default": null, - "description": "Determines the cross-validation splitting strategy.\nPossible inputs for cv are:\n\n- None, to use the default 5-fold cross-validation,\n- integer, to specify the number of folds.\n- :term:`CV splitter`,\n- An iterable yielding (train, test) splits as arrays of indices.\n\nFor integer/None inputs, if ``y`` is binary or multiclass,\n:class:`~sklearn.model_selection.StratifiedKFold` is used. If ``y`` is\nneither binary nor multiclass, :class:`~sklearn.model_selection.KFold`\nis used.\n\nRefer to the :ref:`User Guide ` for the various\ncross-validation strategies that can be used here.\n\nIf \"prefit\" is passed, it is assumed that `estimator` has been\nfitted already and all data is used for calibration.\n\n.. versionchanged:: 0.22\n``cv`` default value if None changed from 3-fold to 5-fold.\n", + "description": "Determines the cross-validation splitting strategy.\nPossible inputs for cv are:\n\n- None, to use the default 5-fold cross-validation,\n- integer, to specify the number of folds.\n- :term:`CV splitter`,\n- An iterable yielding (train, test) splits as arrays of indices.\n\nFor integer/None inputs, if ``y`` is binary or multiclass,\n:class:`~sklearn.model_selection.StratifiedKFold` is used. If ``y`` is\nneither binary nor multiclass, :class:`~sklearn.model_selection.KFold`\nis used.\n\nRefer to the :ref:`User Guide ` for the various\ncross-validation strategies that can be used here.\n\n.. versionchanged:: 0.22\n``cv`` default value if None changed from 3-fold to 5-fold.\n\n.. versionchanged:: 1.6\n`\"prefit\"` is deprecated. Use :class:`~sklearn.frozen.FrozenEstimator`\ninstead.\n", "name": "cv", "optional": true, "type": "int32" @@ -218,8 +218,8 @@ "type": "int32" }, { - "default": true, - "description": "Determines how the calibrator is fitted when `cv` is not `'prefit'`.\nIgnored if `cv='prefit'`.\n\nIf `True`, the `estimator` is fitted using training data, and\ncalibrated using testing data, for each `cv` fold. The final estimator\nis an ensemble of `n_cv` fitted classifier and calibrator pairs, where\n`n_cv` is the number of cross-validation folds. The output is the\naverage predicted probabilities of all pairs.\n\nIf `False`, `cv` is used to compute unbiased predictions, via\n:func:`~sklearn.model_selection.cross_val_predict`, which are then\nused for calibration. At prediction time, the classifier used is the\n`estimator` trained on all the data.\nNote that this method is also internally implemented in\n:mod:`sklearn.svm` estimators with the `probabilities=True` parameter.\n\n.. versionadded:: 0.24\n", + "default": "\"auto\"", + "description": "Determines how the calibrator is fitted.\n\n\"auto\" will use `False` if the `estimator` is a\n:class:`~sklearn.frozen.FrozenEstimator`, and `True` otherwise.\n\nIf `True`, the `estimator` is fitted using training data, and\ncalibrated using testing data, for each `cv` fold. The final estimator\nis an ensemble of `n_cv` fitted classifier and calibrator pairs, where\n`n_cv` is the number of cross-validation folds. The output is the\naverage predicted probabilities of all pairs.\n\nIf `False`, `cv` is used to compute unbiased predictions, via\n:func:`~sklearn.model_selection.cross_val_predict`, which are then\nused for calibration. At prediction time, the classifier used is the\n`estimator` trained on all the data.\nNote that this method is also internally implemented in\n:mod:`sklearn.svm` estimators with the `probabilities=True` parameter.\n\n.. versionadded:: 0.24\n\n.. versionchanged:: 1.6\n`\"auto\"` option is added and is the default.\n", "name": "ensemble", "type": "boolean" }, @@ -274,9 +274,9 @@ }, { "name": "verbose_feature_names_out", - "description": "If True, :meth:`ColumnTransformer.get_feature_names_out` will prefix\nall feature names with the name of the transformer that generated that\nfeature.\nIf False, :meth:`ColumnTransformer.get_feature_names_out` will not\nprefix any feature names and will error if feature names are not\nunique.\n\n.. versionadded:: 1.0\n", + "description": "\n- If True, :meth:`ColumnTransformer.get_feature_names_out` will prefix\nall feature names with the name of the transformer that generated that\nfeature. It is equivalent to setting\n`verbose_feature_names_out=\"{transformer_name}__{feature_name}\"`.\n- If False, :meth:`ColumnTransformer.get_feature_names_out` will not\nprefix any feature names and will error if feature names are not\nunique.\n- If ``Callable[[str, str], str]``,\n:meth:`ColumnTransformer.get_feature_names_out` will rename all the features\nusing the name of the transformer. The first argument of the callable is the\ntransformer name and the second argument is the feature name. The returned\nstring will be the new feature name.\n- If ``str``, it must be a string ready for formatting. The given string will\nbe formatted using two field names: ``transformer_name`` and ``feature_name``.\ne.g. ``\"{feature_name}__{transformer_name}\"``. See :meth:`str.format` method\nfrom the standard library for more info.\n\n.. versionadded:: 1.0\n\n.. versionchanged:: 1.6\n`verbose_feature_names_out` can be a callable or a string to be formatted.\n", "type": "boolean", - "default": true + "default": "True" }, { "name": "force_int_remainder_cols", @@ -1037,13 +1037,13 @@ { "name": "keep_empty_features", "default": false, - "description": "If True, features that consist exclusively of missing values when\n`fit` is called are returned in results when `transform` is called.\nThe imputed value is always `0` except when `strategy=\"constant\"`\nin which case `fill_value` will be used instead.\n\n.. versionadded:: 1.2\n" + "description": "If True, features that consist exclusively of missing values when\n`fit` is called are returned in results when `transform` is called.\nThe imputed value is always `0` except when `strategy=\"constant\"`\nin which case `fill_value` will be used instead.\n\n.. versionadded:: 1.2\n\n.. versionchanged:: 1.6\nCurrently, when `keep_empty_feature=False` and `strategy=\"constant\"`,\nempty features are not dropped. This behaviour will change in version\n1.8. Set `keep_empty_feature=True` to preserve this behaviour.\n" } ] }, { "name": "sklearn.linear_model._logistic.LogisticRegression", - "description": "\nLogistic Regression (aka logit, MaxEnt) classifier.\n\nIn the multiclass case, the training algorithm uses the one-vs-rest (OvR)\nscheme if the 'multi_class' option is set to 'ovr', and uses the\ncross-entropy loss if the 'multi_class' option is set to 'multinomial'.\n(Currently the 'multinomial' option is supported only by the 'lbfgs',\n'sag', 'saga' and 'newton-cg' solvers.)\n\nThis class implements regularized logistic regression using the\n'liblinear' library, 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers. **Note\nthat regularization is applied by default**. It can handle both dense\nand sparse input. Use C-ordered arrays or CSR matrices containing 64-bit\nfloats for optimal performance; any other input format will be converted\n(and copied).\n\nThe 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization\nwith primal formulation, or no regularization. The 'liblinear' solver\nsupports both L1 and L2 regularization, with a dual formulation only for\nthe L2 penalty. The Elastic-Net regularization is only supported by the\n'saga' solver.\n\nRead more in the :ref:`User Guide `.\n", + "description": "\nLogistic Regression (aka logit, MaxEnt) classifier.\n\nThis class implements regularized logistic regression using the\n'liblinear' library, 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers. **Note\nthat regularization is applied by default**. It can handle both dense\nand sparse input. Use C-ordered arrays or CSR matrices containing 64-bit\nfloats for optimal performance; any other input format will be converted\n(and copied).\n\nThe 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization\nwith primal formulation, or no regularization. The 'liblinear' solver\nsupports both L1 and L2 regularization, with a dual formulation only for\nthe L2 penalty. The Elastic-Net regularization is only supported by the\n'saga' solver.\n\nFor :term:`multiclass` problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs'\nhandle multinomial loss. 'liblinear' and 'newton-cholesky' only handle binary\nclassification but can be extended to handle multiclass by using\n:class:`~sklearn.multiclass.OneVsRestClassifier`.\n\nRead more in the :ref:`User Guide `.\n", "attributes": [ { "default": "l2", @@ -1093,7 +1093,7 @@ }, { "default": "lbfgs", - "description": "\nAlgorithm to use in the optimization problem. Default is 'lbfgs'.\nTo choose a solver, you might want to consider the following aspects:\n\n- For small datasets, 'liblinear' is a good choice, whereas 'sag'\nand 'saga' are faster for large ones;\n- For multiclass problems, only 'newton-cg', 'sag', 'saga' and\n'lbfgs' handle multinomial loss;\n- 'liblinear' and 'newton-cholesky' can only handle binary classification\nby default. To apply a one-versus-rest scheme for the multiclass setting\none can wrapt it with the `OneVsRestClassifier`.\n- 'newton-cholesky' is a good choice for `n_samples` >> `n_features`,\nespecially with one-hot encoded categorical features with rare\ncategories. Be aware that the memory usage of this solver has a quadratic\ndependency on `n_features` because it explicitly computes the Hessian\nmatrix.\n\n.. warning::\nThe choice of the algorithm depends on the penalty chosen and on\n(multinomial) multiclass support:\n\n================= ============================== ======================\nsolver penalty multinomial multiclass\n================= ============================== ======================\n'lbfgs' 'l2', None yes\n'liblinear' 'l1', 'l2' no\n'newton-cg' 'l2', None yes\n'newton-cholesky' 'l2', None no\n'sag' 'l2', None yes\n'saga' 'elasticnet', 'l1', 'l2', None yes\n================= ============================== ======================\n\n.. note::\n'sag' and 'saga' fast convergence is only guaranteed on features\nwith approximately the same scale. You can preprocess the data with\na scaler from :mod:`sklearn.preprocessing`.\n\n.. seealso::\nRefer to the User Guide for more information regarding\n:class:`LogisticRegression` and more specifically the\n:ref:`Table `\nsummarizing solver/penalty supports.\n\n.. versionadded:: 0.17\nStochastic Average Gradient descent solver.\n.. versionadded:: 0.19\nSAGA solver.\n.. versionchanged:: 0.22\nThe default solver changed from 'liblinear' to 'lbfgs' in 0.22.\n.. versionadded:: 1.2\nnewton-cholesky solver.\n", + "description": "\nAlgorithm to use in the optimization problem. Default is 'lbfgs'.\nTo choose a solver, you might want to consider the following aspects:\n\n- For small datasets, 'liblinear' is a good choice, whereas 'sag'\nand 'saga' are faster for large ones;\n- For :term:`multiclass` problems, all solvers except 'liblinear' minimize the\nfull multinomial loss;\n- 'liblinear' can only handle binary classification by default. To apply a\none-versus-rest scheme for the multiclass setting one can wrap it with the\n:class:`~sklearn.multiclass.OneVsRestClassifier`.\n- 'newton-cholesky' is a good choice for\n`n_samples` >> `n_features * n_classes`, especially with one-hot encoded\ncategorical features with rare categories. Be aware that the memory usage\nof this solver has a quadratic dependency on `n_features * n_classes`\nbecause it explicitly computes the full Hessian matrix.\n\n.. warning::\nThe choice of the algorithm depends on the penalty chosen and on\n(multinomial) multiclass support:\n\n================= ============================== ======================\nsolver penalty multinomial multiclass\n================= ============================== ======================\n'lbfgs' 'l2', None yes\n'liblinear' 'l1', 'l2' no\n'newton-cg' 'l2', None yes\n'newton-cholesky' 'l2', None no\n'sag' 'l2', None yes\n'saga' 'elasticnet', 'l1', 'l2', None yes\n================= ============================== ======================\n\n.. note::\n'sag' and 'saga' fast convergence is only guaranteed on features\nwith approximately the same scale. You can preprocess the data with\na scaler from :mod:`sklearn.preprocessing`.\n\n.. seealso::\nRefer to the :ref:`User Guide ` for more\ninformation regarding :class:`LogisticRegression` and more specifically the\n:ref:`Table `\nsummarizing solver/penalty supports.\n\n.. versionadded:: 0.17\nStochastic Average Gradient descent solver.\n.. versionadded:: 0.19\nSAGA solver.\n.. versionchanged:: 0.22\nThe default solver changed from 'liblinear' to 'lbfgs' in 0.22.\n.. versionadded:: 1.2\nnewton-cholesky solver.\n", "name": "solver" }, { @@ -1257,7 +1257,7 @@ }, { "name": "sklearn.linear_model.LogisticRegression", - "description": "\nLogistic Regression (aka logit, MaxEnt) classifier.\n\nIn the multiclass case, the training algorithm uses the one-vs-rest (OvR)\nscheme if the 'multi_class' option is set to 'ovr', and uses the\ncross-entropy loss if the 'multi_class' option is set to 'multinomial'.\n(Currently the 'multinomial' option is supported only by the 'lbfgs',\n'sag', 'saga' and 'newton-cg' solvers.)\n\nThis class implements regularized logistic regression using the\n'liblinear' library, 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers. **Note\nthat regularization is applied by default**. It can handle both dense\nand sparse input. Use C-ordered arrays or CSR matrices containing 64-bit\nfloats for optimal performance; any other input format will be converted\n(and copied).\n\nThe 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization\nwith primal formulation, or no regularization. The 'liblinear' solver\nsupports both L1 and L2 regularization, with a dual formulation only for\nthe L2 penalty. The Elastic-Net regularization is only supported by the\n'saga' solver.\n\nRead more in the :ref:`User Guide `.\n", + "description": "\nLogistic Regression (aka logit, MaxEnt) classifier.\n\nThis class implements regularized logistic regression using the\n'liblinear' library, 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers. **Note\nthat regularization is applied by default**. It can handle both dense\nand sparse input. Use C-ordered arrays or CSR matrices containing 64-bit\nfloats for optimal performance; any other input format will be converted\n(and copied).\n\nThe 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization\nwith primal formulation, or no regularization. The 'liblinear' solver\nsupports both L1 and L2 regularization, with a dual formulation only for\nthe L2 penalty. The Elastic-Net regularization is only supported by the\n'saga' solver.\n\nFor :term:`multiclass` problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs'\nhandle multinomial loss. 'liblinear' and 'newton-cholesky' only handle binary\nclassification but can be extended to handle multiclass by using\n:class:`~sklearn.multiclass.OneVsRestClassifier`.\n\nRead more in the :ref:`User Guide `.\n", "attributes": [ { "default": "l2", @@ -1315,7 +1315,7 @@ }, { "default": "lbfgs", - "description": "\nAlgorithm to use in the optimization problem. Default is 'lbfgs'.\nTo choose a solver, you might want to consider the following aspects:\n\n- For small datasets, 'liblinear' is a good choice, whereas 'sag'\nand 'saga' are faster for large ones;\n- For multiclass problems, only 'newton-cg', 'sag', 'saga' and\n'lbfgs' handle multinomial loss;\n- 'liblinear' and 'newton-cholesky' can only handle binary classification\nby default. To apply a one-versus-rest scheme for the multiclass setting\none can wrapt it with the `OneVsRestClassifier`.\n- 'newton-cholesky' is a good choice for `n_samples` >> `n_features`,\nespecially with one-hot encoded categorical features with rare\ncategories. Be aware that the memory usage of this solver has a quadratic\ndependency on `n_features` because it explicitly computes the Hessian\nmatrix.\n\n.. warning::\nThe choice of the algorithm depends on the penalty chosen and on\n(multinomial) multiclass support:\n\n================= ============================== ======================\nsolver penalty multinomial multiclass\n================= ============================== ======================\n'lbfgs' 'l2', None yes\n'liblinear' 'l1', 'l2' no\n'newton-cg' 'l2', None yes\n'newton-cholesky' 'l2', None no\n'sag' 'l2', None yes\n'saga' 'elasticnet', 'l1', 'l2', None yes\n================= ============================== ======================\n\n.. note::\n'sag' and 'saga' fast convergence is only guaranteed on features\nwith approximately the same scale. You can preprocess the data with\na scaler from :mod:`sklearn.preprocessing`.\n\n.. seealso::\nRefer to the User Guide for more information regarding\n:class:`LogisticRegression` and more specifically the\n:ref:`Table `\nsummarizing solver/penalty supports.\n\n.. versionadded:: 0.17\nStochastic Average Gradient descent solver.\n.. versionadded:: 0.19\nSAGA solver.\n.. versionchanged:: 0.22\nThe default solver changed from 'liblinear' to 'lbfgs' in 0.22.\n.. versionadded:: 1.2\nnewton-cholesky solver.\n", + "description": "\nAlgorithm to use in the optimization problem. Default is 'lbfgs'.\nTo choose a solver, you might want to consider the following aspects:\n\n- For small datasets, 'liblinear' is a good choice, whereas 'sag'\nand 'saga' are faster for large ones;\n- For :term:`multiclass` problems, all solvers except 'liblinear' minimize the\nfull multinomial loss;\n- 'liblinear' can only handle binary classification by default. To apply a\none-versus-rest scheme for the multiclass setting one can wrap it with the\n:class:`~sklearn.multiclass.OneVsRestClassifier`.\n- 'newton-cholesky' is a good choice for\n`n_samples` >> `n_features * n_classes`, especially with one-hot encoded\ncategorical features with rare categories. Be aware that the memory usage\nof this solver has a quadratic dependency on `n_features * n_classes`\nbecause it explicitly computes the full Hessian matrix.\n\n.. warning::\nThe choice of the algorithm depends on the penalty chosen and on\n(multinomial) multiclass support:\n\n================= ============================== ======================\nsolver penalty multinomial multiclass\n================= ============================== ======================\n'lbfgs' 'l2', None yes\n'liblinear' 'l1', 'l2' no\n'newton-cg' 'l2', None yes\n'newton-cholesky' 'l2', None no\n'sag' 'l2', None yes\n'saga' 'elasticnet', 'l1', 'l2', None yes\n================= ============================== ======================\n\n.. note::\n'sag' and 'saga' fast convergence is only guaranteed on features\nwith approximately the same scale. You can preprocess the data with\na scaler from :mod:`sklearn.preprocessing`.\n\n.. seealso::\nRefer to the :ref:`User Guide ` for more\ninformation regarding :class:`LogisticRegression` and more specifically the\n:ref:`Table `\nsummarizing solver/penalty supports.\n\n.. versionadded:: 0.17\nStochastic Average Gradient descent solver.\n.. versionadded:: 0.19\nSAGA solver.\n.. versionchanged:: 0.22\nThe default solver changed from 'liblinear' to 'lbfgs' in 0.22.\n.. versionadded:: 1.2\nnewton-cholesky solver.\n", "name": "solver", "optional": true }, @@ -1386,7 +1386,7 @@ "type": "int32" }, { - "description": "Controls the number of jobs that get dispatched during parallel\nexecution. Reducing this number can be useful to avoid an\nexplosion of memory consumption when more jobs get dispatched\nthan CPUs can process. This parameter can be:\n\n- None, in which case all the jobs are immediately\ncreated and spawned. Use this for lightweight and\nfast-running jobs, to avoid delays due to on-demand\nspawning of the jobs\n\n- An int, giving the exact number of total jobs that are\nspawned\n\n- A str, giving an expression as a function of n_jobs,\nas in '2*n_jobs'\n", + "description": "Controls the number of jobs that get dispatched during parallel\nexecution. Reducing this number can be useful to avoid an\nexplosion of memory consumption when more jobs get dispatched\nthan CPUs can process. This parameter can be:\n\n- None, in which case all the jobs are immediately created and spawned. Use\nthis for lightweight and fast-running jobs, to avoid delays due to on-demand\nspawning of the jobs\n- An int, giving the exact number of total jobs that are spawned\n- A str, giving an expression as a function of n_jobs, as in '2*n_jobs'\n", "name": "pre_dispatch", "default": "2*n_jobs" }, @@ -1820,7 +1820,7 @@ }, { "name": "sklearn.preprocessing._data.StandardScaler", - "description": "Standardize features by removing the mean and scaling to unit variance.\n\nThe standard score of a sample `x` is calculated as:\n\nz = (x - u) / s\n\nwhere `u` is the mean of the training samples or zero if `with_mean=False`,\nand `s` is the standard deviation of the training samples or one if\n`with_std=False`.\n\nCentering and scaling happen independently on each feature by computing\nthe relevant statistics on the samples in the training set. Mean and\nstandard deviation are then stored to be used on later data using\n:meth:`transform`.\n\nStandardization of a dataset is a common requirement for many\nmachine learning estimators: they might behave badly if the\nindividual features do not more or less look like standard normally\ndistributed data (e.g. Gaussian with 0 mean and unit variance).\n\nFor instance many elements used in the objective function of\na learning algorithm (such as the RBF kernel of Support Vector\nMachines or the L1 and L2 regularizers of linear models) assume that\nall features are centered around 0 and have variance in the same\norder. If a feature has a variance that is orders of magnitude larger\nthan others, it might dominate the objective function and make the\nestimator unable to learn from other features correctly as expected.\n\n`StandardScaler` is sensitive to outliers, and the features may scale\ndifferently from each other in the presence of outliers. For an example\nvisualization, refer to :ref:`Compare StandardScaler with other scalers\n`.\n\nThis scaler can also be applied to sparse CSR or CSC matrices by passing\n`with_mean=False` to avoid breaking the sparsity structure of the data.\n\nRead more in the :ref:`User Guide `.\n", + "description": "Standardize features by removing the mean and scaling to unit variance.\n\nThe standard score of a sample `x` is calculated as:\n\n.. code-block:: text\n\nz = (x - u) / s\n\nwhere `u` is the mean of the training samples or zero if `with_mean=False`,\nand `s` is the standard deviation of the training samples or one if\n`with_std=False`.\n\nCentering and scaling happen independently on each feature by computing\nthe relevant statistics on the samples in the training set. Mean and\nstandard deviation are then stored to be used on later data using\n:meth:`transform`.\n\nStandardization of a dataset is a common requirement for many\nmachine learning estimators: they might behave badly if the\nindividual features do not more or less look like standard normally\ndistributed data (e.g. Gaussian with 0 mean and unit variance).\n\nFor instance many elements used in the objective function of\na learning algorithm (such as the RBF kernel of Support Vector\nMachines or the L1 and L2 regularizers of linear models) assume that\nall features are centered around 0 and have variance in the same\norder. If a feature has a variance that is orders of magnitude larger\nthan others, it might dominate the objective function and make the\nestimator unable to learn from other features correctly as expected.\n\n`StandardScaler` is sensitive to outliers, and the features may scale\ndifferently from each other in the presence of outliers. For an example\nvisualization, refer to :ref:`Compare StandardScaler with other scalers\n`.\n\nThis scaler can also be applied to sparse CSR or CSC matrices by passing\n`with_mean=False` to avoid breaking the sparsity structure of the data.\n\nRead more in the :ref:`User Guide `.\n", "attributes": [ { "default": true, @@ -1870,7 +1870,7 @@ }, { "default": "error", - "description": "Specifies the way unknown categories are handled during :meth:`transform`.\n\n- 'error' : Raise an error if an unknown category is present during transform.\n- 'ignore' : When an unknown category is encountered during\ntransform, the resulting one-hot encoded columns for this feature\nwill be all zeros. In the inverse transform, an unknown category\nwill be denoted as None.\n- 'infrequent_if_exist' : When an unknown category is encountered\nduring transform, the resulting one-hot encoded columns for this\nfeature will map to the infrequent category if it exists. The\ninfrequent category will be mapped to the last position in the\nencoding. During inverse transform, an unknown category will be\nmapped to the category denoted `'infrequent'` if it exists. If the\n`'infrequent'` category does not exist, then :meth:`transform` and\n:meth:`inverse_transform` will handle an unknown category as with\n`handle_unknown='ignore'`. Infrequent categories exist based on\n`min_frequency` and `max_categories`. Read more in the\n:ref:`User Guide `.\n\n.. versionchanged:: 1.1\n`'infrequent_if_exist'` was added to automatically handle unknown\ncategories and infrequent categories.\n", + "description": "Specifies the way unknown categories are handled during :meth:`transform`.\n\n- 'error' : Raise an error if an unknown category is present during transform.\n- 'ignore' : When an unknown category is encountered during\ntransform, the resulting one-hot encoded columns for this feature\nwill be all zeros. In the inverse transform, an unknown category\nwill be denoted as None.\n- 'infrequent_if_exist' : When an unknown category is encountered\nduring transform, the resulting one-hot encoded columns for this\nfeature will map to the infrequent category if it exists. The\ninfrequent category will be mapped to the last position in the\nencoding. During inverse transform, an unknown category will be\nmapped to the category denoted `'infrequent'` if it exists. If the\n`'infrequent'` category does not exist, then :meth:`transform` and\n:meth:`inverse_transform` will handle an unknown category as with\n`handle_unknown='ignore'`. Infrequent categories exist based on\n`min_frequency` and `max_categories`. Read more in the\n:ref:`User Guide `.\n- 'warn' : When an unknown category is encountered during transform\na warning is issued, and the encoding then proceeds as described for\n`handle_unknown=\"infrequent_if_exist\"`.\n\n.. versionchanged:: 1.1\n`'infrequent_if_exist'` was added to automatically handle unknown\ncategories and infrequent categories.\n\n.. versionadded:: 1.6\nThe option `\"warn\"` was added in 1.6.\n", "name": "handle_unknown" }, { @@ -2128,7 +2128,7 @@ }, { "default": false, - "description": "If true, ``decision_function_shape='ovr'``, and number of classes > 2,\n:term:`predict` will break ties according to the confidence values of\n:term:`decision_function`; otherwise the first class among the tied\nclasses is returned. Please note that breaking ties comes at a\nrelatively high computational cost compared to a simple predict.\n\n.. versionadded:: 0.22\n", + "description": "If true, ``decision_function_shape='ovr'``, and number of classes > 2,\n:term:`predict` will break ties according to the confidence values of\n:term:`decision_function`; otherwise the first class among the tied\nclasses is returned. Please note that breaking ties comes at a\nrelatively high computational cost compared to a simple predict. See\n:ref:`sphx_glr_auto_examples_svm_plot_svm_tie_breaking.py` for an\nexample of its usage with ``decision_function_shape='ovr'``.\n\n.. versionadded:: 0.22\n", "name": "break_ties", "optional": true, "type": "boolean"