-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduling_group: fix race between scheduling group and key creation #2585
Conversation
631f0d8
to
ba2970d
Compare
I think there's another issue: when we add a key:
say we allocate two keys and on one shard we initialize the greater key first, then we have uninitialized keys in the middle of the vector. |
The functions create_scheduling_group and scheduling_group_key_create may have unexpected results when run concurrently. when creating a SG key, we: 1. allocate a key 2. insert the key config 3. for all SGs, initialize data for the SG and key with preemptions possible in between each step. Another process could see an intermediate state where a key was created but not fully initialized yet. This may lead to the data being initialized twice for the new key, or functions operating on uninitialized key data. To solve this we introduce a shared_mutex which is locked exclusively when creating a new key, and is held with shared access when creating a scheduling group and when a consistent read access of all keys is required. Fixes scylladb#2231
use std::map in `scheduling_group_key_configs` to store all scheduling group key configs instead of std::vector. in the code we assume in few places that the keys in the entire vector range are valid. however, when keys are created concurrently, it may happen that a key with higher index is created first, leaving the vector in a state with uninitialized entries. replacing the vector with a map solves this problem, because we have entries only for initialized keys.
ba2970d
to
2c1f15e
Compare
@@ -282,6 +283,7 @@ private: | |||
|
|||
boost::container::static_vector<std::unique_ptr<task_queue>, max_scheduling_groups()> _task_queues; | |||
internal::scheduling_group_specific_thread_local_data _scheduling_group_specific_data; | |||
shared_mutex _scheduling_group_keys_mutex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This presumes that all calls to the scheduling group management functions happen on the same shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could force it with an invoke_on(0), but let's see if there's a less brutal approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the callers are shard-local already.
The functions create_scheduling_group and scheduling_group_key_create
may have unexpected results when run concurrently.
when creating a SG key, we:
with preemptions possible in between each step.
Another process could see an intermediate state where a key was created
but not fully initialized yet. This may lead to the data being
initialized twice for the new key, or functions operating on
uninitialized key data.
To solve this we introduce a shared_mutex which is locked exclusively
when creating a new key, and is held with shared access when creating a
scheduling group and when a consistent read access of all keys is required.
Fixes #2231