Embed key into dict entry #541

hpatro · 2024-05-23T22:15:56Z

This PR incorporates changes related to key embedding described in the redis/redis#12216
With this change there will be no key pointer and embedded the key within the dictEntry. 1 byte is used for additional bookkeeping. Overall the saving would be 7 bytes.

Key changes:

New dict entry type introduced, which is now used as an entry for the main dictionary:

typedef struct {
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;  /* Next entry in the same hash bucket. */
    uint8_t key_header_size; /* offset into key_buf where the key is located at. */
    unsigned char key_buf[]; /* buffer with embedded key. */
} embeddedDictEntry;

One new function has been added to the dictType:

size_t (*embedKey)(unsigned char *buf, size_t buf_len, const void *key, unsigned char *header_size);

Change is opt-in per dict type, hence sets, hashes and other types that are using dictionary are not impacted.
With this change main dictionary now owns the data, so copy on insert in dbAdd is no longer needed.

Benchmarking results

TLDR; Around 9-10% memory usage reduction in overall memory usage for scenario with key of 16 bytes and value of 8 bytes and 16 bytes. The throughput per second varies but is similar or greater in most of the run(s) with the changes against unstable (ae2d421).

Performed on a Amazon EC2 c5.4xlarge instance
Server setup (maximum memory allocated is 100 MB)

src/valkey-server --save "" --daemonize yes --maxmemory 100m --enable-debug-command local --port 6379

GET command used

src/valkey-benchmark  -t get  -n 1000000  -r 10000000

SET command used

src/valkey-benchmark  -t get  -n 1000000  -r 10000000 -d 16

SET performance

	Throughput per sec	Number of keys	Used memory (bytes)	Throughput per sec	Number of keys	Used memory (bytes)	Throughput per sec	Number of keys	Used memory (bytes)
Key Embedding
SET (d = 8, n= 1M, r=10M)	108448.11	951737	78140504	107135.21	951660	78135144	108026.36	951616	78132160
SET (d = 16, n= 1M, r=10M)	106974.76	951677	85750152	107342.21	951611	85745056	107215.62	951757	85754792

Unstable (`ae2d421`)
SET (d = 8, n= 1M, r=10M)	106929	951556	85715672	107009.09	951671	85725056	106484.94	951864	85740680
SET (d = 16, n= 1M, r=10M)	104036.62	951541	93226392	105965.88	951710	93341232	104123.28	951627	93333560

GET performance

Operations	Number of Keys	Used memory	Throughput per sec	Throughput per sec	Throughput per sec

Key Embedding
GET (d = 8, n= 1M, r=10M)	951580	78079272	105797.72	106168.39	107135.21
GET (d = 16, n= 1M, r=10M)	951764	85731504	105797.72	106416.94	106202.2

Unstable (`ae2d421`)
GET (d = 8, n= 1M, r=10M)	951213	85662008	106145.85	105685.91	105741.78
GET (d = 16, n= 1M, r=10M)	951646	93260288	106723.59	106089.55	106349.04

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

codecov · 2024-05-23T22:37:29Z

Codecov Report

Attention: Patch coverage is 96.87500% with 3 lines in your changes missing coverage. Please review.

Project coverage is 70.32%. Comparing base (7719dbb) to head (825c82f).
Report is 10 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #541      +/-   ##
============================================
+ Coverage     70.26%   70.32%   +0.06%     
============================================
  Files           110      111       +1     
  Lines         60108    60286     +178     
============================================
+ Hits          42234    42396     +162     
- Misses        17874    17890      +16

Files	Coverage Δ
src/db.c	`88.39% <100.00%> (+0.06%)`	⬆️
src/defrag.c	`88.24% <100.00%> (+0.68%)`	⬆️
src/dict.c	`97.53% <100.00%> (+0.09%)`	⬆️
src/kvstore.c	`96.17% <100.00%> (-0.59%)`	⬇️
src/rdb.c	`75.80% <100.00%> (-0.09%)`	⬇️
src/sds.c	`86.08% <100.00%> (+0.21%)`	⬆️
src/sds.h	`78.68% <ø> (ø)`
src/server.c	`88.57% <100.00%> (+0.01%)`	⬆️
src/debug.c	`53.40% <0.00%> (ø)`
src/object.c	`78.55% <50.00%> (-0.03%)`	⬇️

... and 21 files with indirect coverage changes

hpatro · 2024-06-11T21:08:06Z

@valkey-io/core-team Could you provide some clarity on this ?

hpatro · 2024-06-11T21:09:29Z

My thought captured under #394 (comment)

We've made changes in the past which can benefit the users in a short term and have redone the implementation in the following major/minor version. key embedding in dictEntry is a pretty small change and we can easily get rid of it with dictEntry removal. I feel the small gain is worth it with this minimal change (7 bytes) for Valkey 8.0 and invest on the kvpair object structure with open addressing in 9.0

zuiderkwast · 2024-06-12T01:10:44Z

Yeah, I've thought about this many times. The complexity is not too bad. The technique of embedding an sds string can later be reused in other places.

Although the embedding is abstracted, it only every makes sense for sds strings. That's fine though. Dict doesn't include "sds.h" so it's decoupled.

I'm in favor. (I'll add some review comments later, just minor things.)

The reason I've been skeptical before is that I'd rather like that we invest in embedding key in robj, since that'd be beneficial in the future redesign (#169), but since this PR is already ready and the robj work is not started, I think we can merge this for Valkey 8.

bbarani · 2024-06-21T17:20:47Z

@hpatro @madolson @zuiderkwast Is this change targeted for Valkey 8? If so, can we add it to Valkey 8 project board?

madolson · 2024-06-21T22:22:53Z

I was waiting on performance numbers from Hari before officially adding it.

bbarani · 2024-06-21T22:29:59Z

I was waiting on performance numbers from Hari before officially adding it.

@hpatro Do you have performance numbers? Can you please add it to this issue to move forward with next steps?

PingXie

This is a risky PR IMO. I am concerned about the mixed use of various dict entry types while using dictEntry as the "universal" pointer. I don't think it is feasible for me to examine every use of dictEntry, which should've been compiler's job. I am happy to help out on the refactoring if needed. Let me know.

src/dict.c

src/server.c

src/dict.c

src/dict.h

src/object.c

PingXie · 2024-06-23T01:05:25Z

src/dict.c

@@ -509,6 +544,8 @@ dictEntry *dictInsertAtPosition(dict *d, void *key, void *position) {
            /* Allocate an entry without value. */
            entry = createEntryNoValue(key, *bucket);
        }
+    } else if (d->type->embedded_entry) {
+        entry = createEmbeddedEntry(key, *bucket, d->type);


The way dictEntry is used has no type safety. Admittedly, this is not a new issue but the addition of embeddedDictEntry is making it worse. The following code path looks problematic to me. can you please double check?

setGenericComamd() -> setKey() -> dbAdd() -> dbAddInsternal() -> kvstoreDictSetKey() -> dictSetKey()

I don't seedictSetKey getting patched to handle this new embedded dict entry.

I would suggest making dictEntry an opaque struct next and force every function go through an inline accessor function/macro. I think this is the only certain way to ensure we don't accidentally use the wrong data type.

Isn't the dictEntry already opaque?

not in dict.c.

I guess I don't agree it should be opaque in dict.c. It seems like a very small number of actual touch points we actually have to make sure are correct. We could even reduce the number. I'm not sure making it opaque will help all that much.

Discussed offline, Ping has a separate proposal for adding guardrails that he will publish.

src/defrag.c

src/dict.c

src/dict.h

hpatro · 2024-06-24T20:14:35Z

Thanks for the review @PingXie and @zuiderkwast . I will shortly address them.

@madolson I've posted the benchmarking results on the top comment.

madolson

I still think overall this is a good iterative improvement before we can make more changes later. I would like to see us embed the key into the robj in the future, so this has a lot of the same primitives as that.

src/dict.c

src/server.c

hpatro · 2024-06-25T18:19:01Z

@zuiderkwast @PingXie @madolson @valkey-io/core-team If we all are aligned on accepting this change in for 8.0, I will look into addressing the comments and polishing the PR further. Let me know.

hwware · 2024-06-25T18:40:53Z

it looks like used memory decreases 10% without changing on qps.

madolson · 2024-06-25T19:09:42Z

@hpatro AFAIK most of the folks are inclined to accept it for 8, so I would ask you to follow up if you can. I'll throw it on our Monday agenda as well to make sure we close on it quickly if we can.

PingXie · 2024-06-26T05:26:38Z

@zuiderkwast @PingXie @madolson @valkey-io/core-team If we all are aligned on accepting this change in for 8.0, I will look into addressing the comments and polishing the PR further. Let me know.

@hpatro, I like the idea! :-) The only reason I marked this PR as "changes required" is because of the amount of type casting in dict.c (and I understand that it didn't start with this PR). This is a great improvement to be had for Valkey 8.0 but I do think we need to make some potentially painful changes to get dict.c back in shape. If you don't mind, I would love to get my hands dirty and help out with the refactoring too.

hpatro · 2024-06-26T05:38:02Z

@zuiderkwast @PingXie @madolson @valkey-io/core-team If we all are aligned on accepting this change in for 8.0, I will look into addressing the comments and polishing the PR further. Let me know.

@hpatro, I like the idea! :-) The only reason I marked this PR as "changes required" is because of the amount of type casting in dict.c (and I understand that it didn't start with this PR). This is a great improvement to be had for Valkey 8.0 but I do think we need to make some potentially painful changes to get dict.c back in shape. If you don't mind, I would love to get my hands dirty and help out with the refactoring too.

Thanks for the help @PingXie. Let me publish the changes which I have done tomorrow and we can go from there. Do we want to do further refactoring as a follow up PR or along with this?

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

PingXie · 2024-06-27T07:10:51Z

Thanks @hpatro.

Do we want to do further refactoring as a follow up PR or along with this?

I am inclined to reduce as much type casting as possible in this PR and reason being that I don't trust myself doing the compiler job (of type checking the data access). I am concerned about the potential memory corruption issues. Will dedicate some time later this week for a deep-dive.

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

hpatro · 2024-06-27T19:00:51Z

Couple of things which remain

Embed key into dict entry #541 (comment) - Abstract out metadata field to independent variable and make the struct packed, it cause issue with build - https://github.com/hpatro/valkey/actions/runs/9701100786/job/26773976709 (I'm trying out few things, let me know if anyone is aware of any solution to avoid this.)

dict.c: In function ‘dictGetNextRef’:
dict.c:930:37: error: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Werror=address-of-packed-member]
  930 |     if (entryIsEmbedded(de)) return &decodeEmbeddedEntry(de)->next;
      |                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Embed key into dict entry #541 (comment) - @PingXie is looking to submit a proposal.

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

hpatro · 2024-06-27T21:27:49Z

@madolson I can't see any other builds apart from DCO check. Are we throttled/out of credits?

src/dict.c

src/server.c

src/server.h

src/sds.c

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

madolson · 2024-06-30T03:46:04Z

I see 4/6 people directionally inclined, so going to throw on the directionally approved tag.

src/server.c

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>

madolson

LGTM, just some minor documentation changes that would make some stuff clearer.

src/kvstore.h

src/db.c

src/server.c

zuiderkwast

Thanks for the added docs. I have a few follow-up comments on those.

src/dict.h

src/db.c

src/kvstore.h

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

hpatro · 2024-07-01T18:42:53Z

@zuiderkwast / @madolson Shall we ship it ? 🚀

zuiderkwast

Yes, LGTM.

bbarani · 2024-07-02T15:13:56Z

Yes, LGTM.

Awesome! Thanks all.

madolson · 2024-07-02T16:49:50Z

Kicked off a full-run to stress this code a bit more before merging: https://github.com/valkey-io/valkey/actions/workflows/daily.yml

zvi-code · 2024-07-02T19:15:22Z

src/sds.c

+        return required_keylen;
+    }
+    assert(buf_len >= required_keylen);
+    memcpy(buf, sdsAllocPtr(s), required_keylen);


why do you need this double information about the header size? you can extract it from the "sds", right?

An sds s is a pointer to where the string content starts, so it can be used as a C string. It does not point to the start of the allocation. The header is store before the char data in the same allocation, i.e. at s[-1], s[-2], etc. The header is backwards-encoded in some way, so the byte at s[-1] says which kind of header it is and how large the header is.

sds-----. | allocation v +-----------+----------------------+ | hdr |string contents \0 | +-----------+----------------------+

When we store this embedded, we want to be able to restore the sds pointer, so we store the size of the header size in the first. When we restore the sds pointer, we can find it using S (the offset to the string contents).

+-+-----------+----------------------+ |S| hdr |string contents \0 | +-+-----------+----------------------+

You mean the callers of this function can call sdsHdrSize(s[-1]) themselves? I'm not sure if it's public.

Yea, right agree. I am not so supportive of the desire to keep something as sds when it's not, but that's already in my overall comment, maybe I'll elaborate more there. Thanks

key is being used throughout the engine as a sds so changing that would be even more touchpoint. We could dynamically build it on the dictGetKey call but that would be additional penalty. Hence, storing it as a sds made the most sense.

I have the same question, the low 3bit in sdshdr.flag have told us the header length, and exposing function sdsHdrSize will not lead to coupling between dict.c and sds.c, I don't see any harm in it.

sdshdr sds │ │ │ │ ▼─────┬─────┬─────┌▼───────────────────────────────┬──┐ │ len │ alloc │ │ │ │ │ │flags│ │\0│ └─────┼─────┼─────└────────────────────────────────┴──┘

zvi-code · 2024-07-02T19:18:16Z

This change makes sense to me overall and tradeoffs are good.

My concern is with that conceptually it's not a good idea that we have 2 conflicting patterns:

use the internal encoding information of sds [pointer is odd]
embed sds into some other non-native allocation buffer, potentially at arbitrary offset

I feel this has very big potential for very hard to track bugs

zvi-code · 2024-07-02T21:13:42Z

This change makes sense to me overall and tradeoffs are good.

My concern is with that conceptually it's not a good idea that we have 2 conflicting patterns:

use the internal encoding information of sds [pointer is odd]

embed sds into some other non-native allocation buffer, potentially at arbitrary offset

I feel this has very big potential for very hard to track bugs

Trying to arrange my thoughts on this. The above is just an example for a wider issue In the code design. IMO there are two types of 'sds' usages 1) A way to carry metadata about a string buffer through IO flow. 2) A compact way to encode string buffer metadata in memory with good locality and low memory overhead.

It is clear to me why use 'sds' for #2. For #1 I feel it could be wrong choice and this choice also has performance costs as it forces memory access to unpack the info many times in IO flow. For example sdsfree access the memory on free, even though it could have been avoided. This has high costs when memory access is a factor.
I also think there is an alternative, to take an approach slightly similar to rust, have some struct 'runtimeSds' with sds as member and metadata, need to check if it can be passed to functions by value, to avoid the need to allocate it on the heap. Thoughts?

hpatro · 2024-07-02T21:24:12Z

This change makes sense to me overall and tradeoffs are good.
My concern is with that conceptually it's not a good idea that we have 2 conflicting patterns:

use the internal encoding information of sds [pointer is odd]

embed sds into some other non-native allocation buffer, potentially at arbitrary offset

I feel this has very big potential for very hard to track bugs

Trying to arrange my thoughts on this. The above is just an example for a wider issue In the code design. IMO there are two types of 'sds' usages 1) A way to carry metadata about a string buffer through IO flow. 2) A compact way to encode string buffer metadata in memory with good locality and low memory overhead.

It is clear to me why use 'sds' for #2. For #1 I feel it could be wrong choice and this choice also has performance costs as it forces memory access to unpack the info many times in IO flow. For example sdsfree access the memory on free, even though it could have been avoided. This has high costs when memory access is a factor. I also think there is an alternative, to take an approach slightly similar to rust, have some struct 'runtimeSds' with sds as member and metadata, need to check if it can be passed to functions by value, to avoid the need to allocate it on the heap. Thoughts?

@zvi-code Thanks for sharing your thought. Would you mind filling a separate issue about this? Given we are planning to rehaul the hashtable implementation (#169) in 9.0. It will be good to capture some of these points.

madolson · 2024-07-02T22:31:03Z

Actual test run I kicked off: https://github.com/valkey-io/valkey/actions/runs/9764931105

judeng · 2024-11-02T02:27:03Z

This pr will bring us considerable cost benefits, thank you @hpatro . I have doubt about the benchmark test results, this PR should improve the cpu cache hit, but why hasn't the performance improved?

hpatro · 2024-11-06T23:46:11Z

This pr will bring us considerable cost benefits, thank you @hpatro . I have doubt about the benchmark test results, this PR should improve the cpu cache hit, but why hasn't the performance improved?

Yeah, I was also expecting more gain in throughput as well but saw a tiny gain. Anyway we were happy with the memory savings (without paying any cost). @judeng Do you have any recommendation to perform benchmarking to be able to observe the improvement in performance from CPU cache locality ?

Embed key into dict entry

2610832

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

hpatro force-pushed the valkey-key-embedding branch from 0a5503b to 2610832 Compare May 23, 2024 22:26

hpatro mentioned this pull request May 23, 2024

Key/Value Embedding in Main Dictionary #394

Open

zuiderkwast self-requested a review May 24, 2024 10:53

madolson added the major-decision-pending Major decision pending by TSC team label Jun 12, 2024

PingXie previously requested changes Jun 23, 2024

View reviewed changes

zuiderkwast reviewed Jun 24, 2024

View reviewed changes

src/defrag.c Outdated Show resolved Hide resolved

src/dict.c Show resolved Hide resolved

src/dict.h Outdated Show resolved Hide resolved

madolson reviewed Jun 25, 2024

View reviewed changes

src/dict.c Outdated Show resolved Hide resolved

src/dict.c Outdated Show resolved Hide resolved

src/server.c Outdated Show resolved Hide resolved

Address feedback pass 1

0936730

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

hpatro added 2 commits June 27, 2024 17:36

Address feedback pass 2

f4c54c9

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

Address feedback pass 3

a56e2c9

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

hpatro added 2 commits June 27, 2024 21:20

Compact allocation for embedded dict entry

029917f

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

Update code comment

3c7d958

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

madolson reviewed Jun 27, 2024

View reviewed changes

src/dict.c Outdated Show resolved Hide resolved

madolson reviewed Jun 28, 2024

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

madolson reviewed Jun 28, 2024

View reviewed changes

src/server.h Outdated Show resolved Hide resolved

madolson reviewed Jun 28, 2024

View reviewed changes

src/sds.c Outdated Show resolved Hide resolved

Update code comment and variable naming

ce76a16

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

madolson added release-notes This issue should get a line item in the release notes major-decision-approved Major decision approved by TSC team and removed major-decision-pending Major decision pending by TSC team labels Jun 30, 2024

madolson reviewed Jun 30, 2024

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

Revert some formatting changes

11400e2

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>

madolson approved these changes Jun 30, 2024

View reviewed changes

src/kvstore.h Outdated Show resolved Hide resolved

src/db.c Outdated Show resolved Hide resolved

src/server.c Outdated Show resolved Hide resolved

madolson added the performance label Jun 30, 2024

zuiderkwast reviewed Jul 1, 2024

View reviewed changes

src/dict.h Show resolved Hide resolved

src/db.c Outdated Show resolved Hide resolved

src/db.c Outdated Show resolved Hide resolved

src/kvstore.h Outdated Show resolved Hide resolved

hpatro added 2 commits July 1, 2024 17:35

Update code comment

4ca43ce

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

Add runtime check on dict type member dependency

825c82f

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

zuiderkwast approved these changes Jul 1, 2024

View reviewed changes

madolson mentioned this pull request Jul 2, 2024

Improve type safety of key embedding #737

Closed

zvi-code reviewed Jul 2, 2024

View reviewed changes

zvi-code mentioned this pull request Jul 2, 2024

code usege of sds (and maybe other data types) #739

Open

madolson merged commit 8faf278 into valkey-io:unstable Jul 2, 2024
19 checks passed

zuiderkwast mentioned this pull request Sep 4, 2024

Embed key and TTL in robj #992

Closed

Embed key into dict entry #541

Embed key into dict entry #541

Conversation

hpatro commented May 23, 2024 • edited Loading

Benchmarking results

SET performance

GET performance

codecov bot commented May 23, 2024 • edited Loading

Codecov Report

hpatro commented Jun 11, 2024

hpatro commented Jun 11, 2024

zuiderkwast commented Jun 12, 2024

bbarani commented Jun 21, 2024 • edited Loading

madolson commented Jun 21, 2024

bbarani commented Jun 21, 2024

PingXie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madolson Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madolson Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

hpatro commented Jun 24, 2024

madolson left a comment

Choose a reason for hiding this comment

hpatro commented Jun 25, 2024

hwware commented Jun 25, 2024

madolson commented Jun 25, 2024

PingXie commented Jun 26, 2024

hpatro commented Jun 26, 2024

PingXie commented Jun 27, 2024

hpatro commented Jun 27, 2024 • edited Loading

hpatro commented Jun 27, 2024

madolson commented Jun 30, 2024

madolson left a comment

Choose a reason for hiding this comment

zuiderkwast left a comment

Choose a reason for hiding this comment

hpatro commented Jul 1, 2024

zuiderkwast left a comment

Choose a reason for hiding this comment

bbarani commented Jul 2, 2024

madolson commented Jul 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zvi-code commented Jul 2, 2024 • edited Loading

zvi-code commented Jul 2, 2024

hpatro commented Jul 2, 2024

madolson commented Jul 2, 2024

judeng commented Nov 2, 2024

hpatro commented Nov 6, 2024

hpatro commented May 23, 2024 •

edited

Loading

codecov bot commented May 23, 2024 •

edited

Loading

bbarani commented Jun 21, 2024 •

edited

Loading

madolson Jun 25, 2024 •

edited

Loading

madolson Jun 25, 2024 •

edited

Loading

hpatro commented Jun 27, 2024 •

edited

Loading

zvi-code commented Jul 2, 2024 •

edited

Loading