Home

API comparison: sktime vs HCrystalball

A comparison of sktime and HCrystalball API designs for forecasting, and proposed way forward.

Design comparison

Both sktime and HCrystalball adopt a sklearn-like fit/predict design, and a unified interface.

High-level differences

The below table summarizes the main differences:

Area	sktime	HCrystalball	HCrystalball comments
data container	pandas series	pandas DataFrame	pandas DataFrame
supports multivariate	no	yes	not natively on wrapper level (i.e. Prophet is not multivariate model by construction as opposed to i.e. VAR models)
supports exogeneous	experimental	yes	yes
supports iloc use	yes	no	yes X.iloc[-5:] will return the last 5 rows even with datetime index
supports loc use	no	yes	yes X.loc["2020-05-01":] will return all rows from "2020-05-01"
type consistent composition	yes	no	unsure HCrystalball aims to utilize maximum from sklearn with minimum custom reimplementations of already existing objects --> we don't have a custom implementation of sklearn GridSearchCV but we use it directly, discussing concrete points would help us to understand this issue
task interoperability	yes	no	no HCrystalball aims to support only time series forecasting. The limited scope is a design decision.

For explanation:

type consistent composition means: composites inherit from, and follow the same interface as a class type ancestor. For example, GridSearchCV in sklearn behaves as a classifier, when constructed with a classifier. The compositor itself is an estimator class.
task interoperability means: the interface is designed to allow reduction to other time series related tasks
loc and iloc usage implies support for integer and date/time indices, and specification of the forecasting horizon as relative steps ahead and absolute time points respectively - HCrystalball's implementation allows you to leverage both indexing schemes - integers and datetimes

On a high-level, HCrystalball's interface seems inspired by Facebook's prophet. sktime's interface is closer to statsmodels and the Hyndman interfaces in R (e.g. forecast, fable).

Advantages and disadvantages

This section highlights advantages, disadvantages, and problems, according to our opinion.

Advantages of sktime:

"natural" interface in univariate case
higher-order operations, including composition and reduction, are well-handled

Problems of sktime:

lack of loc support
no good multivariate support

Advantages of HCrystalball:

support for multivariate and exogeneous
uses abc

Problems of HCrystalball:

higher-order operations are not well-designed or consistent - example would help to see the point
lack of iloc support - (see above)
interface is unintuitive in the univariate case - HCrystalball intention is as close compatibility with sklearn as possible with one exception - leveraging pandas as the main data interface instead of NumPy, this design decision leads to the natural choice of having X in two-dimensions (pandas dataframe) and y pandas series (1D NumPy is also supported) as input for fit and having X (dataframe) for the predict method. This implies an empty data frame with datetime index in the univariate case. HCrystalball in the past supported also just one input for fit and integer (horizon) for predict method for the univariate case, but over time experience showed that using more generic interface leads to better modeling experience (no need to change interface after adding one column, frequent usage of many exogenous variables, less error-prone and cleaner implementations, direct compatibility with the whole sklearn ecosystem...). The design decision to stick with sklearn API also demonstrates our intention to address primarily the ML community rather than a more traditional statistical community around statsmodels).

Problems of both:

does not consistently cover both univariate, multivariate use well - user frustration in at least one sub-case
user cannot use series and DataFrame
no support for both iloc and loc (indexed, e.g., datetime) indexing

Fit/predict API signatures

Up to naming of variables, both sktime and HCrystalball adopt a fit/predict API, of the type

fit(y_past, [x_past], horizon)
predict([x_future], horizon)

where:

y_past is the time series in the past,
horizon is the indices (loc or iloc) to predict at - note that some methods already require this in fit
x_past is exogeneous time series in the past
x_future is exogeneous time series in the future

The differences are mainly in expected type:

variable	sktime	HCrystalball	HCrystalball comments
`y_past`	pandas series	pandas DataFrame	pandas series (on wrapper level)
`horizon` in `fit`	integer sequence	not supported (instead fitting is moved to predict in cases where `horizon` is required for fitting)	in order to follow sklearn API we agreed to stick with original fit and predict signature (fitting in the predict is also done in i.e. KNN implementation in sklearn)
`horizon` in `predict`	integer sequence	empty DataFrame with loc indices	(see above)
`x_past`	pandas DataFrame (experimental)	pandas DataFrame	pandas DataFrame
`x_future`	pandas DataFrame (experimental)	pandas DataFrame	pandas DataFrame

Proposed way forward

The interface differences suggest:

different signature and type choices cover different use cases well (e.g., univariate vs multivariate) - a joint/merged interface may therefore be desirable.
the interfaces are currently incompatible, while compatibility will require support for both series and DataFrames, and support for both loc and iloc indexing.
the sktime interface has an advantage in composition and other higher-order operations. A joint interface should perhaps adopt this.

Requirements for a unified interface

More precisely, a "good" consensus interface should satisfy the following requirements:

support for both series and DataFrames as inputs/outputs. **We prefer just one way how to do things, as sklearn expects 2D for X, this wouldn't allow us to leverage the whole sklearn ecosystem directly **
support for both loc and iloc indexing
support for exogeneous variables
horizon can be passed in fit
consistent typing in higher-order motifs including composition, wrappers, reduction (inherits from resultant type class, components passed in constructor)

Way of working, forward

We therefore suggest:

sktime and HCrystalball work together towards a unified forecasting interface in the next release.
This unified interface should satisfy the requirements outlined above
HCrystalball becomes an affiliated package of sktime (means: compatible interface) - displayed on the landing page with other affiliated and coordinated packages
HCrystalball specifies a scope and roadmaps, e.g., adapters to advanced forecasters with major package dependencies?
individual HCrystalball team members are acknowledged as contributors to sktime, insofar they ontribute to the re-factor
optionally, Heidelberg Cement is acknowledged as a contributing organisation to sktime post-refactor, pending approval of Heidelberg Cement comms

Proposed API re-design principles

The proposed re-design is based on two work items:

HCrystalball adapts sktime's higher-order composition/reduction interface (correct class inheritance structure)
re-factor of fit/predict signatures towards a consensus, which is type union based

The consensus could be as follows:

variable	consensus type
`y_past`	`pandas` `series` or `DataFrame`
return of `predict`	same as type of `y_past`
`horizon`	integer sequence (`iloc`) or sequence of `loc` indices or empty DataFrame with `loc` indices
`x_past`	`pandas` `series` or `DataFrame`
`x_future`	`pandas` `series` or `DataFrame`, needs same type and variables as `x_past`

There may be an additional flag for whether loc or iloc indices are used.

The low-level design could look similar to this, though the linked proposal is mainly concerned with support or datetime.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly