Skip to content

Code supporting a series of experiments on the use of efficient sliding window sketches to aid in the modelling of time series.

License

Notifications You must be signed in to change notification settings

chus-chus/sketchModelling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License: MIT

Sketches for Time-Dependant Machine Learning

This repository contains source code supporting a series of experiments on how streaming (in particular sketches) techniques can aid in the modelling of time series. The results can be found in the paper.

Installation

pip install -i https://test.pypi.org/simple/ skcm

These are the sketches, deep learning models and functionalities included:

  • Exponential Histogram, capable of keeping track of the following statistics:

    • Binary Counter [1]
    • Sum
      • Positive integers [1]
      • Extension over positive real numbers (own)
    • Mean (positive real and real, the former more space efficient)
    • Variance (real) [2]
  • EHRNN A modified Elmann Network (RNN) that efficiently keeps track of hidden state statistics across multiple time resolutions via Exponential Histograms. Implemented in PyTorch.

  • Other utils

    • DataFrame sketch windower: returns a pandas.DataFrame with the results of applying a summarizing sketch over a/some windows (Exponential Histograms). Useful for obtaining descriptive statistics and summarization of data trends across time resolutions.
    • Format converters
      • csv to arff and viceversa
      • pandas.DataFrame to arff
        (arff is a data format used by ML frameworks such as Weka and MOA)

Citations

If you use this code in your research / application, please cite the current pre-print.

@misc{antonanzas2021sketches,
      title={Sketches for Time-Dependent Machine Learning}, 
      author={Jesus Antonanzas and Marta Arias and Albert Bifet},
      year={2021},
      eprint={2108.11923},
      archivePrefix={arXiv},
      URL={https://arxiv.org/abs/2108.11923},
      primaryClass={cs.LG}
} 

References

Ideas from these references have been used in the software:

[1] M. Datar et al. (2002). Maintaining Stream Statistics over Sliding Windows. Society for Industrial and Applied Mathematics, 31(6), 1794-1813.

[2] B. Babcock et al. (2003). Maintaining Variance and k-Medians over Data Stream Windows. Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 22, 234-243.

About

Code supporting a series of experiments on the use of efficient sliding window sketches to aid in the modelling of time series.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published