lookupCommandByCString in VM_Replicate #1175

xingbowang · 2024-10-15T22:45:19Z

The problem/use-case that the feature addresses

Avoid performance hit in VM_Replicate function call.

Description of the feature

My team recently was trying to optimize the performance of our module. We found this function call lookupCommandByCString inside VM_Replicate to taking a lot of CPU time.
https://github.com/valkey-io/valkey/blob/unstable/src/module.c#L3539-L3540

The main reason is that it allocates memory to convert const char * into sds, then perform a dict lookup. From VM_Replicate function, it looks like it is just doing a sanity check without using it for anything else. My guess is the purpose is to protect the engine from running a 3rd party module. However, the protection seems induces too much performance penalty. If we develop our in-house module and don't run other modules, could we bypass this check to avoid performance hit. Is it possible to add a config flag "trust-module-replication" to bypass this check?

Alternatives you've considered

We considered VM_ReplicateVerbatim, but we sometimes needs to rewrite the command during replication. So we could not use it.

Additional information

Any additional information that is relevant to the feature request.

The text was updated successfully, but these errors were encountered:

hpatro · 2024-10-21T21:34:31Z

This helps avoiding unnecessary check/cpu cycles while using modules trusted by the system. @valkey-io/core-team If there is no major concern we can implement this.

madolson · 2024-10-21T22:10:45Z

I'm fine with it. We could make it so that if cmdname is NULL it bypasses the check and just replicates whatever is in the va_args. I'm fine with a new module API as well.

zuiderkwast · 2024-10-22T06:14:33Z

A new flag to skip the command lookup sounds good to me.

"S" for skip? Or "B" for bypass?

I see we even create this sds twice: first for the command lookup and then again when creating the argv for replication. If we move the command lookup from VM_Replicate to moduleCreateArgvFromUserFormat (where we check the flags), then we can also do the command lookup by sds string after creating the sds string which we need to do anyway for replication, so we don't need to create the same sds twice.

madolson · 2024-10-22T19:47:07Z

A new flag to skip the command lookup sounds good to me.

This wasn't the original ask, the ask was to add a server config. I don't think we should do that. I think we should add a separate module API that accepts flags. I don't think the Flags should be single characters, I think they should be an enum. I would prefer the term SKIP_VALIDATION.

hwware · 2024-10-23T01:42:12Z

Sorry, do you mean the function lookupCommandBySdsLogic?

valkey/src/server.c

Line 3152 in 285064b

struct serverCommand *lookupCommandBySdsLogic(dict *commands, sds s) {

because I do not find any codes of converting const char * into sds

hwware · 2024-10-23T01:51:53Z

Is there to update the API struct serverCommand *lookupCommandBySdsLogic(dict commands, sds s), update sds s to char part instead of create a new flag？
If i understand correctly, you think the most cost of CPU is line

valkey/src/server.c

Line 3156 in 285064b

if (argc < 1 || argc > 2) {

xingbowang · 2024-10-23T03:12:46Z

The function is lookupCommandByCString, which call lookupCommandByCStringLogic, which call lookupCommandBySdsLogic, during which a temp sds is created. The memory allocation and free is the main overhead.

zuiderkwast · 2024-10-23T08:38:22Z

@xingbowang If this temp sds allocation and free is the main overhead, then it can be solved by the optimization: move the command lookup to moduleCreateArgvFromUserFormat after creating the argv and use lookupCommandBySds. No new config required. Correct?

xingbowang · 2024-10-24T23:00:46Z

@xingbowang If this temp sds allocation and free is the main overhead, then it can be solved by the optimization: move the command lookup to moduleCreateArgvFromUserFormat after creating the argv and use lookupCommandBySds. No new config required. Correct?

I don't think it solves the problem. moduleCreateArgvFromUserFormat is called during VM_Call, which is when the module command is executed. However, the issue happens inside VM_Replicate. E.g. getset was replicate as set. So lookup the command at VM_call would not help, as the command in VM_replicate could be different.

Meantime, I don't know why the code lookup the command inside VM_Replicate. Because it was not used in anyway after the lookup. I talked with Madelyn, who suggested that it was used mainly for debugging purpose. I guess the reason could be replicated command name could be different from the command get called. So developer need to validate it. THis makes more sense to turn on/off as a debugging feature.

sungming2 · 2024-11-20T18:03:36Z

I started working on this issue

sungming2 · 2024-11-21T01:19:00Z

@madolson This wasn't the original ask, the ask was to add a server config. I don't think we should do that. I think we should add a separate module API that accepts flags. I don't think the Flags should be single characters, I think they should be an enum. I would prefer the term SKIP_VALIDATION.

I have a few questions on this.

Can I know why we should not add a new flag to format (fmt)? since the lookupCommands is just debugging/checking purpose in the replication, I think we could use the flag to bypass it after setting up the all flags.
Could you elaborate on "add a separate module API accepting flags"? Does it mean we want to create another new API taking the new flag like VM_ReplicateWithFlags?

hpatro · 2024-11-21T19:15:22Z

I have a few questions on this.

Can I know why we should not add a new flag to format (fmt)? since the lookupCommands is just debugging/checking purpose in the replication, I think we could use the flag to bypass it after setting up the all flags.

To avoid mixing the utility of format specifier (command behavior) vs flags where we could indicate to bypass validation check.

Could you elaborate on "add a separate module API accepting flags"? Does it mean we want to create another new API taking the new flag like VM_ReplicateWithFlags?

Yes. This seems more cleaner to me as well.

hwware · 2024-11-21T19:34:01Z

Here are the root causes, 2 locations:

In the lookupCommandByCStringLogic
The variable sds name = sdsnew(s); is created and later it will be freed after lookupCommandBySdsLogic() is called
In the lookupCommandBySdsLogic
The variable sds *strings = sdssplitlen(s, sdslen(s), "|", 1, &argc); The strings is only an intermediate variable for creating the
robj objects[argc];

Thus one way to solve the problem is that do not convert char* cmdname to sds in function lookupCommandByCStringLogic(), rather parsing the char* cmdname directly in the function lookupCommandBySdsLogic because we only need to know if there is one symbol '|' in the cmdname as below statement:

sds *strings = sdssplitlen(s, sdslen(s), "|", 1, &argc);

sungming2 · 2024-11-21T21:47:08Z

I think we can do both for this issue

bypass validation with flag
do not convert char* to sds

For #2, I took a look at the lookupCommandBySdsLogic references, there a few places that generate sds unnecessarily to look up command such as

struct redisCommand *ACLLookupCommand(const char *name) {
    struct redisCommand *cmd;
    sds sdsname = sdsnew(name);
    cmd = lookupCommandBySdsLogic(server.orig_commands,sdsname);
    sdsfree(sdsname);
    return cmd;
}

struct redisCommand *lookupCommandByCStringLogic(dict *commands, const char *s) {
    struct redisCommand *cmd;
    sds name = sdsnew(s);

    cmd = lookupCommandBySdsLogic(commands,name);
    sdsfree(name);
    return cmd;
}

But since lookupCommandBySds and lookupCommandOrOriginalBySds also call lookupCommandBySdsLogic which need to take sds as a parameter, I think we can modify the lookupCommandByCStringLogic to have validity check with the char *cmdname, and then call lookupCommandLogic directly that can get rid of the unnecessary conversion to sds.

I want to hear some opinion about this

hpatro · 2024-11-21T22:03:54Z

IIUC, the request is to bypass the validation altogether. Why do we need to change the lookup logic? Modules can be trusted to pass valid commands with usage of the API flags to gain performance benefits or deal with the failure on replicas.

madolson · 2024-11-21T22:11:26Z

Yeah, I think we should just skip the lookup check and just allow clients to blindly send a command to the replication stream.

hwware · 2024-11-22T15:44:17Z

If we just look at the function lookupCommandByCStringLogic call stack carefully, we can find this function is called by lookupCommandByCString. And in module, VM_GetCommand, VM_Replicate, VM_EmitAOF all of these API call the lookupCommandByCString and they should have the same performance problem.

Thus, my question is if we can let all these 3 API skip the lookup OR just add a flag to only let VM_Replicate to skip it?

hpatro · 2024-11-22T18:48:55Z

If we just look at the function lookupCommandByCStringLogic call stack carefully, we can find this function is called by lookupCommandByCString. And in module, VM_GetCommand, VM_Replicate, VM_EmitAOF all of these API call the lookupCommandByCString and they should have the same performance problem.

Thus, my question is if we can let all these 3 API skip the lookup OR just add a flag to only let VM_Replicate to skip it?

But that is changing the behavior of existing API. I think we shouldn't do that at least in a minor version and a major version is quite far away. Hence, I feel introducing a new API with flag is more convenient at this point. @hwware WDYT ?

hwware · 2024-11-22T22:13:23Z

If we just look at the function lookupCommandByCStringLogic call stack carefully, we can find this function is called by lookupCommandByCString. And in module, VM_GetCommand, VM_Replicate, VM_EmitAOF all of these API call the lookupCommandByCString and they should have the same performance problem.
Thus, my question is if we can let all these 3 API skip the lookup OR just add a flag to only let VM_Replicate to skip it?

But that is changing the behavior of existing API. I think we shouldn't do that at least in a minor version and a major version is quite far away. Hence, I feel introducing a new API with flag is more convenient at this point. @hwware WDYT ?

I am ok for it. How do you think @hpatro suggestion? @xingbowang

hwware · 2024-11-28T15:20:09Z

@xingbowang Could you please check the pr #1357? @sungming2 already make a new API, but it looks like there is no obvious performance improvement, could you help check if the direction is wrong? Thanks

xingbowang · 2024-12-02T21:24:13Z

Thank you. I think that will fix my problem. It is an elegant solution that allows module to control whether to look the command up again or not.
I do see CPU utilization dropped by 2%. The benchmark only pushed the engine to 54% to 52%. So there is enough CPU time to perform the additional lookup. If they push the baseline to 100%, I think it will make a difference.

hwware · 2024-12-05T18:40:27Z

According to the latest benchmark result, with the bypass validation the CPU is decreased from 93.3% to 92.7%, less than 1% change. TSC will check the API changes. Thanks

hpatro self-assigned this Nov 6, 2024

madolson assigned sungming2 Nov 20, 2024

hpatro removed their assignment Nov 20, 2024

sungming2 mentioned this issue Nov 26, 2024

Add new replicate module API to bypass command validation #1357

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lookupCommandByCString in VM_Replicate #1175

lookupCommandByCString in VM_Replicate #1175

xingbowang commented Oct 15, 2024

hpatro commented Oct 21, 2024

madolson commented Oct 21, 2024 •

edited

Loading

zuiderkwast commented Oct 22, 2024

madolson commented Oct 22, 2024

hwware commented Oct 23, 2024

hwware commented Oct 23, 2024

xingbowang commented Oct 23, 2024

zuiderkwast commented Oct 23, 2024

xingbowang commented Oct 24, 2024

sungming2 commented Nov 20, 2024

sungming2 commented Nov 21, 2024 •

edited

Loading

hpatro commented Nov 21, 2024

hwware commented Nov 21, 2024 •

edited

Loading

sungming2 commented Nov 21, 2024

hpatro commented Nov 21, 2024

madolson commented Nov 21, 2024

hwware commented Nov 22, 2024 •

edited

Loading

hpatro commented Nov 22, 2024

hwware commented Nov 22, 2024

hwware commented Nov 28, 2024

xingbowang commented Dec 2, 2024 •

edited

Loading

hwware commented Dec 5, 2024

lookupCommandByCString in VM_Replicate #1175

lookupCommandByCString in VM_Replicate #1175

Comments

xingbowang commented Oct 15, 2024

hpatro commented Oct 21, 2024

madolson commented Oct 21, 2024 • edited Loading

zuiderkwast commented Oct 22, 2024

madolson commented Oct 22, 2024

hwware commented Oct 23, 2024

hwware commented Oct 23, 2024

xingbowang commented Oct 23, 2024

zuiderkwast commented Oct 23, 2024

xingbowang commented Oct 24, 2024

sungming2 commented Nov 20, 2024

sungming2 commented Nov 21, 2024 • edited Loading

hpatro commented Nov 21, 2024

hwware commented Nov 21, 2024 • edited Loading

sungming2 commented Nov 21, 2024

hpatro commented Nov 21, 2024

madolson commented Nov 21, 2024

hwware commented Nov 22, 2024 • edited Loading

hpatro commented Nov 22, 2024

hwware commented Nov 22, 2024

hwware commented Nov 28, 2024

xingbowang commented Dec 2, 2024 • edited Loading

hwware commented Dec 5, 2024

madolson commented Oct 21, 2024 •

edited

Loading

sungming2 commented Nov 21, 2024 •

edited

Loading

hwware commented Nov 21, 2024 •

edited

Loading

hwware commented Nov 22, 2024 •

edited

Loading

xingbowang commented Dec 2, 2024 •

edited

Loading