-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lookupCommandByCString in VM_Replicate #1175
Comments
This helps avoiding unnecessary check/cpu cycles while using modules trusted by the system. @valkey-io/core-team If there is no major concern we can implement this. |
I'm fine with it. We could make it so that if cmdname is NULL it bypasses the check and just replicates whatever is in the va_args. I'm fine with a new module API as well. |
A new flag to skip the command lookup sounds good to me. "S" for skip? Or "B" for bypass? I see we even create this sds twice: first for the command lookup and then again when creating the argv for replication. If we move the command lookup from VM_Replicate to moduleCreateArgvFromUserFormat (where we check the flags), then we can also do the command lookup by sds string after creating the sds string which we need to do anyway for replication, so we don't need to create the same sds twice. |
This wasn't the original ask, the ask was to add a server config. I don't think we should do that. I think we should add a separate module API that accepts flags. I don't think the Flags should be single characters, I think they should be an enum. I would prefer the term |
Sorry, do you mean the function lookupCommandBySdsLogic? Line 3152 in 285064b
|
Is there to update the API struct serverCommand *lookupCommandBySdsLogic(dict commands, sds s), update sds s to char part instead of create a new flag? Line 3156 in 285064b
|
The function is |
@xingbowang If this temp sds allocation and free is the main overhead, then it can be solved by the optimization: move the command lookup to moduleCreateArgvFromUserFormat after creating the argv and use lookupCommandBySds. No new config required. Correct? |
I don't think it solves the problem. Meantime, I don't know why the code lookup the command inside VM_Replicate. Because it was not used in anyway after the lookup. I talked with Madelyn, who suggested that it was used mainly for debugging purpose. I guess the reason could be replicated command name could be different from the command get called. So developer need to validate it. THis makes more sense to turn on/off as a debugging feature. |
I started working on this issue |
I have a few questions on this.
|
To avoid mixing the utility of format specifier (command behavior) vs flags where we could indicate to bypass validation check.
Yes. This seems more cleaner to me as well. |
Here are the root causes, 2 locations:
Thus one way to solve the problem is that do not convert char* cmdname to sds in function lookupCommandByCStringLogic(), rather parsing the char* cmdname directly in the function lookupCommandBySdsLogic because we only need to know if there is one symbol '|' in the cmdname as below statement: sds *strings = sdssplitlen(s, sdslen(s), "|", 1, &argc); |
I think we can do both for this issue
For #2, I took a look at the
But since I want to hear some opinion about this |
IIUC, the request is to bypass the validation altogether. Why do we need to change the lookup logic? Modules can be trusted to pass valid commands with usage of the API flags to gain performance benefits or deal with the failure on replicas. |
Yeah, I think we should just skip the lookup check and just allow clients to blindly send a command to the replication stream. |
If we just look at the function lookupCommandByCStringLogic call stack carefully, we can find this function is called by lookupCommandByCString. And in module, VM_GetCommand, VM_Replicate, VM_EmitAOF all of these API call the lookupCommandByCString and they should have the same performance problem. Thus, my question is if we can let all these 3 API skip the lookup OR just add a flag to only let VM_Replicate to skip it? |
But that is changing the behavior of existing API. I think we shouldn't do that at least in a minor version and a major version is quite far away. Hence, I feel introducing a new API with flag is more convenient at this point. @hwware WDYT ? |
I am ok for it. How do you think @hpatro suggestion? @xingbowang |
@xingbowang Could you please check the pr #1357? @sungming2 already make a new API, but it looks like there is no obvious performance improvement, could you help check if the direction is wrong? Thanks |
Thank you. I think that will fix my problem. It is an elegant solution that allows module to control whether to look the command up again or not. |
According to the latest benchmark result, with the bypass validation the CPU is decreased from 93.3% to 92.7%, less than 1% change. TSC will check the API changes. Thanks |
The problem/use-case that the feature addresses
Avoid performance hit in VM_Replicate function call.
Description of the feature
My team recently was trying to optimize the performance of our module. We found this function call
lookupCommandByCString
insideVM_Replicate
to taking a lot of CPU time.https://github.com/valkey-io/valkey/blob/unstable/src/module.c#L3539-L3540
The main reason is that it allocates memory to convert const char * into sds, then perform a dict lookup. From
VM_Replicate
function, it looks like it is just doing a sanity check without using it for anything else. My guess is the purpose is to protect the engine from running a 3rd party module. However, the protection seems induces too much performance penalty. If we develop our in-house module and don't run other modules, could we bypass this check to avoid performance hit. Is it possible to add a config flag "trust-module-replication" to bypass this check?Alternatives you've considered
We considered VM_ReplicateVerbatim, but we sometimes needs to rewrite the command during replication. So we could not use it.
Additional information
Any additional information that is relevant to the feature request.
The text was updated successfully, but these errors were encountered: