Rework Non i.i.d. data notebook #786

ArturoAmorQ · 2024-12-10T16:57:03Z

The plot showcasing the use of a ShuffleSplit strategy Non i.i.d. data notebook changed in a recent version of the pandas plotting utility:

In previous versions:

Now:

We could use the opportunity to rework this whole notebook, as mentioned in #784 (review):

Not use groups with TimeSeriesSplit (it's currently rising a UserWarning)
Use a more realistic dataset (optional)
Give interpretation to resulting R2 and MSE scores:
- results are above or below chance level?
- they are over optimistic when not evaluated properly
- predictions are not realistic, when using the current dataset, a simple DecisionTreeRegressor can foresee a sudden drop in quotes
Mention the actual good practices for modeling, e.g. aligning the test size of TimeSeriesSplit with the forecasting task
In general give more focus to the use of TimeSeriesSplit

The text was updated successfully, but these errors were encountered:

glemaitre · 2024-12-13T22:06:40Z

I wanted to report this issue because I just saw the plot. A hot fix for the plot is to make sure that the data point are ordered by increasing date.

Provide feedback