Replies: 5 comments 16 replies
-
I've a quick glance to the original paper. Seems that the min-heap is used to storing the |
Beta Was this translation helpful? Give feedback.
-
Several points:
|
Beta Was this translation helpful? Give feedback.
-
I came up with the following binary design scheme. metadata
-d: is the length of the "hash table" list, with sub key value of hash table array
Because the images provided in the paper are not so much a "hash table list" as a "matrix", and the "buckets" corresponding to each array do not have the scaling function of a "hash table". So matrix storage is used. sub key value of min-heap
Firstly, min heap can be implemented using arrays; Secondly, the min heap mentioned in the paper also needs to have the ability to be randomly modified, so each node needs to be stored separately. And 'flow ID' requires global uniqueness, so an indefinite length storage method is used. endMay I ask if there are any issues with this design proposal? |
Beta Was this translation helpful? Give feedback.
-
topk structure should use inside kvrocks server as a metric store in rocksdb is bad idea
|
Beta Was this translation helpful? Give feedback.
-
AFAIK, if "K" in topK is not so large, I prefer the structure below:
Pro:
Cons:
|
Beta Was this translation helpful? Give feedback.
-
Related issues: Support Top-K data structure and commands #2424
Reference paper: HeavyKeeper: An Accurate Algorithm for Finding Top-k Elephant Flows
Required data structure:
A matrix of$d \times n$ , where the matrix elements are a key value pair
min-heap, where node elements are a key value pair
There are several ideas on how to design metadata
For the part of the hash table list, we can follow the design of List, where metadata stores the beginning and end of d arrays, and the values of ‘sub keys values’ are the corresponding hash tables. The question is whether there is a performance issue with the entire hash table being read and written every time.
Just like the diagram below
metadata
sub key value
Another issue is the min-heap, which can reuse zset. In other words, I need to store a zset key in the
other data
of the metadata above. The problem is how to ensure that the key here is globally unique and does not conflict with the use of other zsets in this namespace.The above two questions.
Beta Was this translation helpful? Give feedback.
All reactions