Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lookupCommandByCString in VM_Replicate #1175

Open
xingbowang opened this issue Oct 15, 2024 · 22 comments
Open

lookupCommandByCString in VM_Replicate #1175

xingbowang opened this issue Oct 15, 2024 · 22 comments
Assignees

Comments

@xingbowang
Copy link

The problem/use-case that the feature addresses

Avoid performance hit in VM_Replicate function call.

Description of the feature

My team recently was trying to optimize the performance of our module. We found this function call lookupCommandByCString inside VM_Replicate to taking a lot of CPU time.
https://github.com/valkey-io/valkey/blob/unstable/src/module.c#L3539-L3540

The main reason is that it allocates memory to convert const char * into sds, then perform a dict lookup. From VM_Replicate function, it looks like it is just doing a sanity check without using it for anything else. My guess is the purpose is to protect the engine from running a 3rd party module. However, the protection seems induces too much performance penalty. If we develop our in-house module and don't run other modules, could we bypass this check to avoid performance hit. Is it possible to add a config flag "trust-module-replication" to bypass this check?

Alternatives you've considered

We considered VM_ReplicateVerbatim, but we sometimes needs to rewrite the command during replication. So we could not use it.

Additional information

Any additional information that is relevant to the feature request.

@hpatro
Copy link
Contributor

hpatro commented Oct 21, 2024

This helps avoiding unnecessary check/cpu cycles while using modules trusted by the system. @valkey-io/core-team If there is no major concern we can implement this.

@madolson
Copy link
Member

madolson commented Oct 21, 2024

I'm fine with it. We could make it so that if cmdname is NULL it bypasses the check and just replicates whatever is in the va_args. I'm fine with a new module API as well.

@zuiderkwast
Copy link
Contributor

A new flag to skip the command lookup sounds good to me.

"S" for skip? Or "B" for bypass?

I see we even create this sds twice: first for the command lookup and then again when creating the argv for replication. If we move the command lookup from VM_Replicate to moduleCreateArgvFromUserFormat (where we check the flags), then we can also do the command lookup by sds string after creating the sds string which we need to do anyway for replication, so we don't need to create the same sds twice.

@madolson
Copy link
Member

A new flag to skip the command lookup sounds good to me.

This wasn't the original ask, the ask was to add a server config. I don't think we should do that. I think we should add a separate module API that accepts flags. I don't think the Flags should be single characters, I think they should be an enum. I would prefer the term SKIP_VALIDATION.

@hwware
Copy link
Member

hwware commented Oct 23, 2024

Sorry, do you mean the function lookupCommandBySdsLogic?

struct serverCommand *lookupCommandBySdsLogic(dict *commands, sds s) {
because I do not find any codes of converting const char * into sds

@hwware
Copy link
Member

hwware commented Oct 23, 2024

Is there to update the API struct serverCommand *lookupCommandBySdsLogic(dict commands, sds s), update sds s to char part instead of create a new flag?
If i understand correctly, you think the most cost of CPU is line

if (argc < 1 || argc > 2) {

@xingbowang
Copy link
Author

The function is lookupCommandByCString, which call lookupCommandByCStringLogic, which call lookupCommandBySdsLogic, during which a temp sds is created. The memory allocation and free is the main overhead.

@zuiderkwast
Copy link
Contributor

@xingbowang If this temp sds allocation and free is the main overhead, then it can be solved by the optimization: move the command lookup to moduleCreateArgvFromUserFormat after creating the argv and use lookupCommandBySds. No new config required. Correct?

@xingbowang
Copy link
Author

@xingbowang If this temp sds allocation and free is the main overhead, then it can be solved by the optimization: move the command lookup to moduleCreateArgvFromUserFormat after creating the argv and use lookupCommandBySds. No new config required. Correct?

I don't think it solves the problem. moduleCreateArgvFromUserFormat is called during VM_Call, which is when the module command is executed. However, the issue happens inside VM_Replicate. E.g. getset was replicate as set. So lookup the command at VM_call would not help, as the command in VM_replicate could be different.

Meantime, I don't know why the code lookup the command inside VM_Replicate. Because it was not used in anyway after the lookup. I talked with Madelyn, who suggested that it was used mainly for debugging purpose. I guess the reason could be replicated command name could be different from the command get called. So developer need to validate it. THis makes more sense to turn on/off as a debugging feature.

@hpatro hpatro self-assigned this Nov 6, 2024
@sungming2
Copy link
Contributor

I started working on this issue

@hpatro hpatro removed their assignment Nov 20, 2024
@sungming2
Copy link
Contributor

sungming2 commented Nov 21, 2024

@madolson This wasn't the original ask, the ask was to add a server config. I don't think we should do that. I think we should add a separate module API that accepts flags. I don't think the Flags should be single characters, I think they should be an enum. I would prefer the term SKIP_VALIDATION.

I have a few questions on this.

  1. Can I know why we should not add a new flag to format (fmt)? since the lookupCommands is just debugging/checking purpose in the replication, I think we could use the flag to bypass it after setting up the all flags.
  2. Could you elaborate on "add a separate module API accepting flags"? Does it mean we want to create another new API taking the new flag like VM_ReplicateWithFlags?

@hpatro
Copy link
Contributor

hpatro commented Nov 21, 2024

I have a few questions on this.

  1. Can I know why we should not add a new flag to format (fmt)? since the lookupCommands is just debugging/checking purpose in the replication, I think we could use the flag to bypass it after setting up the all flags.

To avoid mixing the utility of format specifier (command behavior) vs flags where we could indicate to bypass validation check.

  1. Could you elaborate on "add a separate module API accepting flags"? Does it mean we want to create another new API taking the new flag like VM_ReplicateWithFlags?

Yes. This seems more cleaner to me as well.

@hwware
Copy link
Member

hwware commented Nov 21, 2024

Here are the root causes, 2 locations:

  1. In the lookupCommandByCStringLogic
    The variable sds name = sdsnew(s); is created and later it will be freed after lookupCommandBySdsLogic() is called

  2. In the lookupCommandBySdsLogic
    The variable sds *strings = sdssplitlen(s, sdslen(s), "|", 1, &argc); The strings is only an intermediate variable for creating the
    robj objects[argc];

Thus one way to solve the problem is that do not convert char* cmdname to sds in function lookupCommandByCStringLogic(), rather parsing the char* cmdname directly in the function lookupCommandBySdsLogic because we only need to know if there is one symbol '|' in the cmdname as below statement:

sds *strings = sdssplitlen(s, sdslen(s), "|", 1, &argc);

@sungming2
Copy link
Contributor

I think we can do both for this issue

  1. bypass validation with flag
  2. do not convert char* to sds

For #2, I took a look at the lookupCommandBySdsLogic references, there a few places that generate sds unnecessarily to look up command such as

struct redisCommand *ACLLookupCommand(const char *name) {
    struct redisCommand *cmd;
    sds sdsname = sdsnew(name);
    cmd = lookupCommandBySdsLogic(server.orig_commands,sdsname);
    sdsfree(sdsname);
    return cmd;
}

struct redisCommand *lookupCommandByCStringLogic(dict *commands, const char *s) {
    struct redisCommand *cmd;
    sds name = sdsnew(s);

    cmd = lookupCommandBySdsLogic(commands,name);
    sdsfree(name);
    return cmd;
}

But since lookupCommandBySds and lookupCommandOrOriginalBySds also call lookupCommandBySdsLogic which need to take sds as a parameter, I think we can modify the lookupCommandByCStringLogic to have validity check with the char *cmdname, and then call lookupCommandLogic directly that can get rid of the unnecessary conversion to sds.

I want to hear some opinion about this

@hpatro
Copy link
Contributor

hpatro commented Nov 21, 2024

IIUC, the request is to bypass the validation altogether. Why do we need to change the lookup logic? Modules can be trusted to pass valid commands with usage of the API flags to gain performance benefits or deal with the failure on replicas.

@madolson
Copy link
Member

Yeah, I think we should just skip the lookup check and just allow clients to blindly send a command to the replication stream.

@hwware
Copy link
Member

hwware commented Nov 22, 2024

If we just look at the function lookupCommandByCStringLogic call stack carefully, we can find this function is called by lookupCommandByCString. And in module, VM_GetCommand, VM_Replicate, VM_EmitAOF all of these API call the lookupCommandByCString and they should have the same performance problem.

Thus, my question is if we can let all these 3 API skip the lookup OR just add a flag to only let VM_Replicate to skip it?

@hpatro
Copy link
Contributor

hpatro commented Nov 22, 2024

If we just look at the function lookupCommandByCStringLogic call stack carefully, we can find this function is called by lookupCommandByCString. And in module, VM_GetCommand, VM_Replicate, VM_EmitAOF all of these API call the lookupCommandByCString and they should have the same performance problem.

Thus, my question is if we can let all these 3 API skip the lookup OR just add a flag to only let VM_Replicate to skip it?

But that is changing the behavior of existing API. I think we shouldn't do that at least in a minor version and a major version is quite far away. Hence, I feel introducing a new API with flag is more convenient at this point. @hwware WDYT ?

@hwware
Copy link
Member

hwware commented Nov 22, 2024

If we just look at the function lookupCommandByCStringLogic call stack carefully, we can find this function is called by lookupCommandByCString. And in module, VM_GetCommand, VM_Replicate, VM_EmitAOF all of these API call the lookupCommandByCString and they should have the same performance problem.
Thus, my question is if we can let all these 3 API skip the lookup OR just add a flag to only let VM_Replicate to skip it?

But that is changing the behavior of existing API. I think we shouldn't do that at least in a minor version and a major version is quite far away. Hence, I feel introducing a new API with flag is more convenient at this point. @hwware WDYT ?

I am ok for it. How do you think @hpatro suggestion? @xingbowang

@hwware
Copy link
Member

hwware commented Nov 28, 2024

@xingbowang Could you please check the pr #1357? @sungming2 already make a new API, but it looks like there is no obvious performance improvement, could you help check if the direction is wrong? Thanks

@xingbowang
Copy link
Author

xingbowang commented Dec 2, 2024

Thank you. I think that will fix my problem. It is an elegant solution that allows module to control whether to look the command up again or not.
I do see CPU utilization dropped by 2%. The benchmark only pushed the engine to 54% to 52%. So there is enough CPU time to perform the additional lookup. If they push the baseline to 100%, I think it will make a difference.

@hwware
Copy link
Member

hwware commented Dec 5, 2024

According to the latest benchmark result, with the bypass validation the CPU is decreased from 93.3% to 92.7%, less than 1% change. TSC will check the API changes. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants