Skip to content

0.11.0 Mar 15, 2018: KLL quantiles sketch, tuple sketch API change and more

Compare
Choose a tag to compare
@AlexanderSaydakov AlexanderSaydakov released this 16 Mar 02:20
· 1974 commits to master since this release
  • New KLL sketch: KllFloatsSketch:
    • This is a new quantiles sketch with better accuracy per stored bit than the original quantiles DoublesSketch. If you select a value of K for the KLL sketch so that it matches the same accuracy as the DoublesSketch, the K will be larger, but the space required will be much smaller. This sketch is specifically tuned for the smallest amount of space usage as possible (near theoretical optimum) and uses floats rather than doubles. On update this new KLL sketch is a little faster than the original DoublesSketch, but may be slower on merge. Also, this KLL sketch currently does not have a generic version (as does the DoublesSketch) nor does it provide off-heap capability like the DoublesSketch. Refer to the javadocs for a link to the KLL theoretical paper.
  • Tuple:
    • generic sketch API change
      • removed the convention to require static methods with a certain signature, these methods are now based on a more visible API
      • added SummaryDeserializer
      • The need to serialize factories has been removed
      • removed getSummaries() method - use iterator instead
  • Theta:
    • added new SingleItemSketch - fast way to create sketches with a single input item
  • Original quantiles sketch enhancements:
    • added getRank() - faster than getCDF() with one split point
    • empty sketch returns null from getQuantiles(), getPMF() and getCDF()
    • empty sketch returns NaN from getQuantile(), getMinValue() and getMaxValue()
    • Komologorov-Smirnov Statistic between two quantiles sketches
    • fixed sorting using comparator in generic ItemsSketch