scheduling_group: fix race between scheduling group and key creation #2585

mlitvk · 2024-12-16T14:40:42Z

The functions create_scheduling_group and scheduling_group_key_create
may have unexpected results when run concurrently.

when creating a SG key, we:

allocate a key
insert the key config
for all SGs, initialize data for the SG and key

with preemptions possible in between each step.

Another process could see an intermediate state where a key was created
but not fully initialized yet. This may lead to the data being
initialized twice for the new key, or functions operating on
uninitialized key data.

To solve this we introduce a shared_mutex which is locked exclusively
when creating a new key, and is held with shared access when creating a
scheduling group and when a consistent read access of all keys is required.

Fixes #2231

mlitvk · 2024-12-17T09:24:27Z

I think there's another issue:
scheduling_group_key_configs stores all key configs as a vector, and it is assumed in few places that 0..key_configs.size() are all the valid keys.

when we add a key:

allocate a key (fetch add)
on all shards, add the key cfg to the vector at configs[key_id]

say we allocate two keys and on one shard we initialize the greater key first, then we have uninitialized keys in the middle of the vector.

The functions create_scheduling_group and scheduling_group_key_create may have unexpected results when run concurrently. when creating a SG key, we: 1. allocate a key 2. insert the key config 3. for all SGs, initialize data for the SG and key with preemptions possible in between each step. Another process could see an intermediate state where a key was created but not fully initialized yet. This may lead to the data being initialized twice for the new key, or functions operating on uninitialized key data. To solve this we introduce a shared_mutex which is locked exclusively when creating a new key, and is held with shared access when creating a scheduling group and when a consistent read access of all keys is required. Fixes scylladb#2231

use std::map in `scheduling_group_key_configs` to store all scheduling group key configs instead of std::vector. in the code we assume in few places that the keys in the entire vector range are valid. however, when keys are created concurrently, it may happen that a key with higher index is created first, leaving the vector in a state with uninitialized entries. replacing the vector with a map solves this problem, because we have entries only for initialized keys.

avikivity · 2024-12-17T13:02:41Z

include/seastar/core/reactor.hh

@@ -282,6 +283,7 @@ private:

    boost::container::static_vector<std::unique_ptr<task_queue>, max_scheduling_groups()> _task_queues;
    internal::scheduling_group_specific_thread_local_data _scheduling_group_specific_data;
+    shared_mutex _scheduling_group_keys_mutex;


This presumes that all calls to the scheduling group management functions happen on the same shard.

We could force it with an invoke_on(0), but let's see if there's a less brutal approach.

No, the callers are shard-local already.

mlitvk marked this pull request as ready for review December 16, 2024 14:43

mlitvk requested a review from piodul December 16, 2024 14:43

mlitvk force-pushed the sched_group_key_race branch from 631f0d8 to ba2970d Compare December 16, 2024 15:50

mlitvk marked this pull request as draft December 17, 2024 08:50

mlitvk added 3 commits December 17, 2024 13:12

scheduling_group: fix indentation

05dac02

mlitvk force-pushed the sched_group_key_race branch from ba2970d to 2c1f15e Compare December 17, 2024 11:38

mlitvk marked this pull request as ready for review December 17, 2024 12:05

avikivity reviewed Dec 17, 2024

View reviewed changes

avikivity merged commit 993cfd5 into scylladb:master Dec 17, 2024
15 checks passed

piodul mentioned this pull request Jan 13, 2025

create_scheduling_group / scheduling_group_key_create not exception safe when SG key data constructor throws #2222

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduling_group: fix race between scheduling group and key creation #2585

scheduling_group: fix race between scheduling group and key creation #2585

mlitvk commented Dec 16, 2024

mlitvk commented Dec 17, 2024 •

edited

Loading

avikivity Dec 17, 2024

avikivity Dec 17, 2024

avikivity Dec 17, 2024

scheduling_group: fix race between scheduling group and key creation #2585

scheduling_group: fix race between scheduling group and key creation #2585

Conversation

mlitvk commented Dec 16, 2024

mlitvk commented Dec 17, 2024 • edited Loading

avikivity Dec 17, 2024

Choose a reason for hiding this comment

avikivity Dec 17, 2024

Choose a reason for hiding this comment

avikivity Dec 17, 2024

Choose a reason for hiding this comment

mlitvk commented Dec 17, 2024 •

edited

Loading