Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduling_group: fix race between scheduling group and key creation #2585

Merged
merged 3 commits into from
Dec 17, 2024

Conversation

mlitvk
Copy link
Contributor

@mlitvk mlitvk commented Dec 16, 2024

The functions create_scheduling_group and scheduling_group_key_create
may have unexpected results when run concurrently.

when creating a SG key, we:

  1. allocate a key
  2. insert the key config
  3. for all SGs, initialize data for the SG and key

with preemptions possible in between each step.

Another process could see an intermediate state where a key was created
but not fully initialized yet. This may lead to the data being
initialized twice for the new key, or functions operating on
uninitialized key data.

To solve this we introduce a shared_mutex which is locked exclusively
when creating a new key, and is held with shared access when creating a
scheduling group and when a consistent read access of all keys is required.

Fixes #2231

@mlitvk mlitvk marked this pull request as ready for review December 16, 2024 14:43
@mlitvk mlitvk requested a review from piodul December 16, 2024 14:43
@mlitvk mlitvk force-pushed the sched_group_key_race branch from 631f0d8 to ba2970d Compare December 16, 2024 15:50
@mlitvk mlitvk marked this pull request as draft December 17, 2024 08:50
@mlitvk
Copy link
Contributor Author

mlitvk commented Dec 17, 2024

I think there's another issue:
scheduling_group_key_configs stores all key configs as a vector, and it is assumed in few places that 0..key_configs.size() are all the valid keys.

when we add a key:

  1. allocate a key (fetch add)
  2. on all shards, add the key cfg to the vector at configs[key_id]

say we allocate two keys and on one shard we initialize the greater key first, then we have uninitialized keys in the middle of the vector.

The functions create_scheduling_group and scheduling_group_key_create
may have unexpected results when run concurrently.

when creating a SG key, we:
1. allocate a key
2. insert the key config
3. for all SGs, initialize data for the SG and key
with preemptions possible in between each step.

Another process could see an intermediate state where a key was created
but not fully initialized yet. This may lead to the data being
initialized twice for the new key, or functions operating on
uninitialized key data.

To solve this we introduce a shared_mutex which is locked exclusively
when creating a new key, and is held with shared access when creating a
scheduling group and when a consistent read access of all keys is required.

Fixes scylladb#2231
use std::map in `scheduling_group_key_configs` to store all
scheduling group key configs instead of std::vector.

in the code we assume in few places that the keys in the entire vector
range are valid. however, when keys are created concurrently, it may
happen that a key with higher index is created first, leaving the vector
in a state with uninitialized entries.

replacing the vector with a map solves this problem, because we have
entries only for initialized keys.
@mlitvk mlitvk force-pushed the sched_group_key_race branch from ba2970d to 2c1f15e Compare December 17, 2024 11:38
@mlitvk mlitvk marked this pull request as ready for review December 17, 2024 12:05
@@ -282,6 +283,7 @@ private:

boost::container::static_vector<std::unique_ptr<task_queue>, max_scheduling_groups()> _task_queues;
internal::scheduling_group_specific_thread_local_data _scheduling_group_specific_data;
shared_mutex _scheduling_group_keys_mutex;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This presumes that all calls to the scheduling group management functions happen on the same shard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could force it with an invoke_on(0), but let's see if there's a less brutal approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the callers are shard-local already.

@avikivity avikivity merged commit 993cfd5 into scylladb:master Dec 17, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

create_scheduling_group / scheduling_group_key_create are not safe to run in parallel
2 participants