T-Digest Design Proposal #2542

LindaSummer · 2024-09-17T15:25:56Z

LindaSummer
Sep 17, 2024

Introduction

Redis Stack has supported a new probabilistic data structure t-digest.

In #2423, we plan to support t-digest as well.

Basic about T-Digest

The original paper is Computing Extremely Accurate Quantiles Using t-Digests [1].

Thanks to the blog T-Digest in Python [7] and the slide [8], I have a better understanding of t-digest.

The main idea of t-digest is to divide the data into small bins and store the mean and count of each bin aka centroids.

This action compressed ranges of data into a single centroids with just mean and weight. The mean is the average of the data in the bin and the weight is the number of data compressed in the bin.

This behavior is called as sketch. Sketch is necessary when we need to deal with plenty of data.

We use these centroids with interpolation to estimate the quantile of the data.

We lost some precisions of the original data and get the ability to easily merge them and calculate the quantile.

We use the potential function k to control the distribution of the bins.

Inside the function we provide a $\delta$ to control the error of the distribution.

metadata

In metadata, we should store the compression $(1/\delta)$, total_weight, minimum and maximum of t-digest.

For temporary input buffer, we should have a buffer left space for merging trigger.
When one item is pushed, buffer_free will be decreased. When buffer_free is zero, we should merge the buffer to the centroids and reset the buffer_free.

        +----------+------------+-----------+-----------+----------------+----------------+-----------------+---------------+-------------+
key =>  |  flags   |  expire    |  version  |  size     |  compression   |  total_weight  |     minimum     |    maximum    | buffer_free |
        | (1byte)  | (Ebyte)    |  (8byte)  | (Sbyte)   |  double(8byte) |  double(8byte) |  double(8byte)  | double(8byte) |    Sbyte    |
        +----------+------------+-----------+-----------+----------------+----------------+-----------------+---------------+-------------+

centroids

Each centroid should be sorted by mean and have a weight. The mean and weight are both double precision numbers.
So, if the we try to keep the order of mean, we should design a way to serialize double and keep its order same before serialization.
Here is one encoding format of duckdb [9] to keep the original order of double and convert to a 8-byte uint64.

inline uint64_t EncodeDouble(double x) {
  uint64_t buff;
  //! zero
  if (x == 0) {
   buff = 0;
   buff += (1ULL << 63);
   return buff;
  }
  // nan
  if (Value::IsNan(x)) {
   return ULLONG_MAX;
  }
  //! infinity
  if (x > DBL_MAX) {
   return ULLONG_MAX - 1;
  }
  //! -infinity
  if (x < -DBL_MAX) {
   return 0;
  }
  buff = Load<uint64_t>(const_data_ptr_cast(&x));
  if (buff < (1ULL << 63)) { //! +0 and positive numbers
   buff += (1ULL << 63);
  } else {          //! negative numbers
   buff = ~buff; //! complement 1
  }
  return buff;
 }

Here is the subkey of centroid.

'centroid' is an enum for the centroid subkey.
'mean' is a serialized double with order.

                              +----------------+
key|version|centroid|mean =>  |      weight    |
                              | (8byte) double |
                              +----------------+

temparory buffer

Temp buffer is just a list of doubles, we should control the limit of whole buffer's size and should acuiqre a lock when try to merge it to the centroids.

Since we have a way to serialize the double to a ordered one, we can use same way to encode the temparory double buffer.

To avoid collisions with same value, we should add an unique identifier for each item. It can be a uint64 millisecond timestamp for simplicity.

'buffer' is a enum for buffer subtype.
'value' is serialized double for data to be inserted with order.
'id' is a unique identifier for collision.

  key|version|buffer|value|id => NULL

concurrency safety

The pushing to temporary list action should have acquire the lock for updating the metadata.

The merge action should acquire a lock. It includes the buffer merging and merging with other digests.

When we try to calculate the perncetile or CDF, we should merge the buffer and make a snapshot.

Then the calculation should have no lock.

References

[1]Computing Extremely Accurate Quantiles Using t-Digests

[2]https://github.com/tdunning/t-digest

[3]https://issues.apache.org/jira/browse/ARROW-11367

[4]apache/arrow#9310

[5]https://github.com/apache/arrow/blob/main/cpp/src/arrow/util/tdigest.cc

[6]https://redis.io/docs/latest/develop/data-types/probabilistic/t-digest/

[7]T-Digest in Python

[8]https://blog.bcmeng.com/pdf/TDigest.pdf

[9]https://github.com/duckdb/duckdb

PragmaTwice · 2024-09-17T16:27:32Z

PragmaTwice
Sep 17, 2024
Collaborator

Currently the data structures in kvrocks is NOT composable, which means you cannot just construct a Redis list and state that it's a part of your new data structure. All rocksdb keys inside one object (in redis level) should have the same user key and different sub keys.

So maybe you need to clearly describe the format of all sub keys in your data structure.

2 replies

LindaSummer Sep 18, 2024
Author

Thanks very much for your suggestion! 😊
I will research the related data structure's design in kvrocks and update with a more comprehensive proposal.

LindaSummer Sep 22, 2024
Author

Hi @PragmaTwice ,

I have updated the proposal with detail subkey and payload design. 😊

Please help take a review and give me some suggestions if possible.

Best Regards,
Edward

mapleFU · 2024-09-22T14:39:20Z

mapleFU
Sep 22, 2024
Collaborator

The design is great. It's the first time I know the t-digest here

For temporary input buffer, we should have a buffer with capacity and current length.

Would you mind describe the buffer more specifically?

Any I'm curious why buffer has a capacity in kvrocks since it's not a in-memory structure

4 replies

LindaSummer Sep 22, 2024
Author

Hi @mapleFU ,

Thanks very much for your review! 😊

Frankly I didn't add the buffer length or the capacity to metadata at first time in design.

The capacity is a configuration for the buffer's maximum elements.

When the buffer's size reached the capacity, it should be the time to merge the buffer back to centroids just like the code in apache arrow.

If we don't need to have the flexibility for each t-digest, it can be set as a global configuration, and we only need to maintain the size of buffer.

Best Regards,
Edward

mapleFU Sep 22, 2024
Collaborator

The capacity is a configuration for the buffer's maximum elements.

I think that's a in-memory design for the arrow::ResizableBuffer and vector<double>. Isn't this a pure in memory structure rather than a on-disk implementation? Would you persist the "non-used trailing buffer" to the disk?

LindaSummer Sep 23, 2024
Author

Hi @mapleFU ,

Isn't this a pure in memory structure rather than a on-disk implementation?

Yes, I agree that it should be a pure memory implementation and should be replaced on disk scenario.

Would you persist the "non-used trailing buffer" to the disk?

Please correct me if I misunderstand.
I think this would be a better way and we just need to decrement a counter and reset it after merging action.

Best Regards,
Edward

LindaSummer Sep 23, 2024
Author

Hi @mapleFU ,

I have updated the metadata design and use a buffer_free to represent the available buffer element count before merging. 😊

Best Regards,
Edward

LindaSummer · 2024-09-26T14:51:06Z

LindaSummer
Sep 26, 2024
Author

Hi @PragmaTwice , @mapleFU ,

Sorry to bother you again.

I have updated the proposal with some improvement.
Please give me some suggestions if possible. 😊

I plan to create a tracking issue for all t-digest commands for further development if the proposal is ready to be implemented.

Best Regards,
Edward

5 replies

PragmaTwice Oct 8, 2024
Collaborator

Sure. Thank you for your updates.

It looks good to me and worth a try. Here's some points:

EncodeDouble is implemented here;
could buffer_free be buffer_size?
how does the temp buffer work in t-digest? maybe we can store the buffer more densely if it's small or the order doesn't matter?

LindaSummer Oct 9, 2024
Author

Hi @PragmaTwice ,

Thanks very much for your patience and suggestions! 😊

EncodeDouble is implemented here

Thanks for your advice! I will use the existing encoding function to encode the double number.

could buffer_free be buffer_size?

how does the temp buffer work in t-digest? maybe we can store the buffer more densely if it's small or the order doesn't matter?

In previous design, I want to reduce the lock for the buffer updating.

But since we need to change the buffer's size info, the overhead of lock a dense small buffer should be similar with meta and an item.

Plus, a dense buffer will reduce the overhead of iterating the items, they can be operated with just one query.

I will update the design with a buffer_size and a dense buffer and start the implementation.

Best Regards,
Edward

PragmaTwice Oct 9, 2024
Collaborator

If the buffer is totally in one subkey, maybe we don't need to keep the buffer_size?

PragmaTwice Oct 9, 2024
Collaborator

Anyway it depends on the max size of the buffer. If it's small (< several KBs) then it' good to keep it in just one subkey.

LindaSummer Oct 9, 2024
Author

Thanks for your advice! 😊

The buffer should be small, so we can just embed the size inside the subkey itself.

Best Regards,
Edward

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T-Digest Design Proposal #2542

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 11 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

T-Digest Design Proposal #2542

LindaSummer Sep 17, 2024

Introduction

Basic about T-Digest

metadata

centroids

temparory buffer

concurrency safety

References

Replies: 3 comments · 11 replies

PragmaTwice Sep 17, 2024 Collaborator

LindaSummer Sep 18, 2024 Author

LindaSummer Sep 22, 2024 Author

mapleFU Sep 22, 2024 Collaborator

LindaSummer Sep 22, 2024 Author

mapleFU Sep 22, 2024 Collaborator

LindaSummer Sep 23, 2024 Author

LindaSummer Sep 23, 2024 Author

LindaSummer Sep 26, 2024 Author

PragmaTwice Oct 8, 2024 Collaborator

LindaSummer Oct 9, 2024 Author

PragmaTwice Oct 9, 2024 Collaborator

PragmaTwice Oct 9, 2024 Collaborator

LindaSummer Oct 9, 2024 Author

LindaSummer
Sep 17, 2024

Replies: 3 comments 11 replies

PragmaTwice
Sep 17, 2024
Collaborator

LindaSummer Sep 18, 2024
Author

LindaSummer Sep 22, 2024
Author

mapleFU
Sep 22, 2024
Collaborator

LindaSummer Sep 22, 2024
Author

mapleFU Sep 22, 2024
Collaborator

LindaSummer Sep 23, 2024
Author

LindaSummer Sep 23, 2024
Author

LindaSummer
Sep 26, 2024
Author

PragmaTwice Oct 8, 2024
Collaborator

LindaSummer Oct 9, 2024
Author

PragmaTwice Oct 9, 2024
Collaborator

PragmaTwice Oct 9, 2024
Collaborator

LindaSummer Oct 9, 2024
Author