Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OTHER] Feedback on reduction to regression get_sklearn_wrapper #46

Open
mloning opened this issue Sep 22, 2020 · 5 comments
Open

[OTHER] Feedback on reduction to regression get_sklearn_wrapper #46

mloning opened this issue Sep 22, 2020 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@mloning
Copy link

mloning commented Sep 22, 2020

Hi, following our conversion yesterday, I have two questions about the get_sklearn_wrapper:

1. API: Why do you dynamically create the init and pass the class rather than simply using composition and passing the object?

forecaster = get_sklearn_wrapper(DummyRegressor, lags=3)  # you do this
forecaster = get_sklearn_wrapper(DummyRegressor(), lags=3)  # why not this?
  • This would avoid the issues you seem to have with local class definitions and pickling
  • This would allow you to pass composite models like pipelines or GridSearchCV into the wrapper
  • I really like the dynamic init creator, but my intuition tells me it's misapplied here.

2. Algorithm: Why not use standard recursive strategy?

from hcrystalball.wrappers import get_sklearn_wrapper
from sklearn.dummy import DummyRegressor
import pandas as pd
import numpy as np

index = pd.date_range("2000", periods=13, freq="Y")
y_train = pd.Series(np.arange(10), index=index[:-3])
X_train = pd.DataFrame(index=y_train.index)
X_test = pd.DataFrame(index=index[-3:])

model = get_sklearn_wrapper(DummyRegressor, lags=3)
model.fit(X_train, y_train)
model.predict(X_test)
# >>> 2010-12-31	7.0
# >>> 2011-12-31	7.0
# >>> 2012-12-31	7.0

# you use the first 3 values as lagged variables, the DummyRegressor simply computes the mean of the rest
# so shouldn't the result be the following?
y_train.iloc[3:].mean()  # >>> 6.0

# the problem seems to be in the way you generate the target series 
# you increase the gap between lagged variables and target to match the length of 
# forecasting horizon 
X, y = model._transform_data_to_tsmodel_input_format(X_train, y_train, len(X_test))
pd.concat([X, pd.Series(y, index=X.index, name="y")], axis=1).head()
#	lag_0	lag_1	lag_2	y
#5	2.0	1.0	0.0	5
#6	3.0	2.0	1.0	6
#7	4.0	3.0	2.0	7
#8	5.0	4.0	3.0	8
#9	6.0	5.0	4.0	9

Hope this helps!

@mloning mloning added the question Further information is requested label Sep 22, 2020
@MichalChromcak
Copy link
Collaborator

@mloning ad the result of DummyRegressor output - the default strategy of DummyRegressor is mean (but mean of whole training dataset)

Parts from the DummyRegressor docstrings

Parameters
    ----------
    strategy : str
        Strategy to use to generate predictions.

        * "mean": always predicts the mean of the training set

Examples
    --------
    >>> import numpy as np
    >>> from sklearn.dummy import DummyRegressor
    >>> X = np.array([1.0, 2.0, 3.0, 4.0])
    >>> y = np.array([2.0, 3.0, 5.0, 10.0])
    >>> dummy_regr = DummyRegressor(strategy="mean")
    >>> dummy_regr.fit(X, y)
    DummyRegressor()
    >>> dummy_regr.predict(X)
    array([5., 5., 5., 5.])

@MichalChromcak
Copy link
Collaborator

@therhaag any thoughts on the class versus object mentioned by Markus?

@therhaag
Copy link

Hi @mloning - sorry for the delayed response. The reason for passing the class rather than using composition is to avoid the need for a custom get_params/set_params implementation which distinguishes between parameters of the wrapper and parameters of the class being wrapped. The sklearn implementation of these functions in base.py relies on discovering all possible parameters of the estimator from the signature of the constructor, so we need to expose those through the constructor of the wrapper. There may be a simpler way to do this, though, more close to how composite models behave - in fact, I remember looking into this option but I can't remember what was the difficulty with that, so I'd be happy to consider alternatives.

@mloning
Copy link
Author

mloning commented Nov 18, 2020

@therhaag but sklearn supports composition with nested parameters using their double underscore convention, I think that would be the cleaner solution in the end.

@therhaag
Copy link

Hi @mloning - I think you're right. We added the sklearn wrapper after the other wrappers which are designed for models which don't follow the sklearn convention themselves, applying the same mechanism. For the sklearn models, one could indeed use composition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants