This repository contains source code supporting a series of experiments on how streaming (in particular sketches) techniques can aid in the modelling of time series. The results can be found in the paper.
Installation
pip install -i https://test.pypi.org/simple/ skcm
These are the sketches, deep learning models and functionalities included:
-
Exponential Histogram, capable of keeping track of the following statistics:
-
EHRNN A modified Elmann Network (RNN) that efficiently keeps track of hidden state statistics across multiple time resolutions via Exponential Histograms. Implemented in PyTorch.
-
Other utils
- DataFrame sketch windower: returns a
pandas.DataFrame
with the results of applying a summarizing sketch over a/some windows (Exponential Histograms). Useful for obtaining descriptive statistics and summarization of data trends across time resolutions. - Format converters
- DataFrame sketch windower: returns a
If you use this code in your research / application, please cite the current pre-print.
@misc{antonanzas2021sketches,
title={Sketches for Time-Dependent Machine Learning},
author={Jesus Antonanzas and Marta Arias and Albert Bifet},
year={2021},
eprint={2108.11923},
archivePrefix={arXiv},
URL={https://arxiv.org/abs/2108.11923},
primaryClass={cs.LG}
}
Ideas from these references have been used in the software:
[1] M. Datar et al. (2002). Maintaining Stream Statistics over Sliding Windows. Society for Industrial and Applied Mathematics, 31(6), 1794-1813.
[2] B. Babcock et al. (2003). Maintaining Variance and k-Medians over Data Stream Windows. Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 22, 234-243.