-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT add parameter validation using BaseEstimator #958
base: main
Are you sure you want to change the base?
Conversation
Should we add this for all BaseEstimators ? |
Yep, we could basically have this everywhere. I just wanted to make a test. So if we want to support, we would need to vendor a file like in imbalanced-learn: https://github.com/scikit-learn-contrib/imbalanced-learn/blob/master/imblearn/utils/_param_validation.py I don't think this is worth the development time right now: we might bump the minimal version of scikit-learn and then it would be usable. So we can let this PR on the side for the moment. Here are two future advantages that I foresee:
But let's postpone. |
relevant discussion in scikit-learn: scikit-learn/scikit-learn#22722 I would say for now I am +0 on this, I suggest we discuss it in one of the skrub meetings. some potential drawbacks IMHO:
None of these are major issues and they may well be small drawbacks compared to the advantages listed in the scikit-learn issue 22722 |
@@ -67,7 +67,7 @@ class AggJoiner(TransformerMixin, BaseEstimator): | |||
The placeholder string "X" can be provided to perform | |||
self-aggregation on the input data. | |||
|
|||
key : str, default=None | |||
key : str or iterable of str, default=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a side note, we use "iterable" everywhere but I wonder if we should say "sequence" (or "list"?) because it is more understandable for users who are less familiar with the python/computer programming jargon. also it is arguably a bit more accurate because we iterate over these parameters several times and sometimes index them so some iterables would not be appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to discuss this
I had the same question on the Joiner PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, this should be a list (and you can accept loosely tuple) but this is more friendly than stating sequence that is only meaningful for Python developer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with "list" -- and we already use that term in a bunch of places
They are valid point. I would say that this feature is supposed to become part of the developer API at some point. Regarding the runtime checks, I think this is just a way to have consistent checks regarding the parameter instead of delegating to the dev that create the class. For the user, you get the benefit of a nice error message. I know that SciPy was looking at a similar checking system. To me the real benefit is the next step regarding consistent documentation and stubs. But as I said, this is really not a priority right now considering the effort of development. At least now, @TheooJ and @jeromedockes are aware that this is existing and it would cost defining a dictionary ;) |
indeed, thanks :) (once we drop support for scikit-learn 1.2) |
This bring parameter validation that is included in the supported scikit-learn version.