kernel: Change locking in k_thread_suspend() #59400

peter-mitsis · 2023-06-19T19:27:24Z

Updates k_thread_suspend() to work on SMP systems. It functions much like k_thread_abort().

If the target thread is running on another CPU, the target thread gets marked as "halting with a reason". Meanwhile the calling thread waits for the target thread to finish halting (suspend or abort). Note that the reason bit has been implemented such that an abort takes precedence over a suspend to cover the case a thread running on another CPU gets suspended and then aborted (or vice versa). If the target thread is not running on another CPU, then the target thread can be immediately aborted/suspended.

Fixes #58392

andyross · 2023-07-05T14:15:07Z

Sorry, missed this earlier. I guess I don't get the point here? This is a behavior change: the call used to reschedule unconditionally if you suspended yourself (i.e. in the situation where we know it's required) and not if you suspended someone else. Now it reschedules every time, which means that it acts as a yield point where it didn't before.

Personally I'm agnostic on that sort of thing, but there has been argument in the past about this sort of behavior.

But regardless: I don't see why this is a good idea? Suspending another thread isn't an obvious spot where a user would expect a yield (because obviously we're the highest priority thread right now, so we can't be suspending anyone else running and thus the run choice shouldn't change on our local CPU).

My guess is that if there's a deadlock in the linked test that is fixed here, it's that the test is incorrectly sensitive to thread run order and needs better synchronization internally. Effectively all this is doing is adding a k_yield() after the suspend, it's not fixing anything.

peter-mitsis · 2023-07-05T14:54:37Z

On a UP system, the old code works great. In such a system it makes perfect sense to only reschedule if the thread is attempting to suspend itself as the currently executing thread would by definition be the highest priority thread that is ready to run. On an SMP system, that assumption no longer holds. This means that on an SMP system, we must perform a reschedule as the thread that is being suspended could be executing on another core.

This in turn leads to the following choice--we can either have distinct logic for UP and SMP cases, or we can use the same logic for both. I opted for the same logic for both to keep things simple (and hopefully easier to maintain).

What are the penalties for unconditionally calling z_reschedule() in the UP case? Unless I am missing something, the only penalty should be a little extra time as it runs through the logic to determine whether or not to actually schedule in a new thread. Is that extra time problematic? I don't think so.

What about the scheduling point? Again, unless I am missing something, calling z_reschedule() should not result in any scheduling difference in UP.

andyross

OK, I think I'm seeing this. This is deep, unfortunately. Bear with me:

What are the penalties for unconditionally calling z_reschedule() in the UP case?

If you're in a cooperative thread or holding a k_sched_lock(), it will break your lock and run another thread after this change, where it wouldn't before (or it would, but only in the case where you suspended yourself and would obviously expect/desire that behavior). That's the source of previous arguments about scheduling points, mostly. Our API is sensitive to this in ways that other OSes aren't, because we make promises they don't. (The counterargument, FWIW, is that we've been historically bad at documenting where the scheduling points are, so they shouldn't count as a stable API).

On an SMP system, that assumption no longer holds.

Sure it does. The thread being suspended isn't running, so by definition suspending it isn't going to affect the run queue anywhere else either.

Except, as you point out, what if it is actually running on another CPU?

And here we're just falling down. k_thread_suspend() simply isn't handling the case in SMP where you try to suspend an active thread. It doesn't work, and it's never worked. As I read it, all it would do is attempt to dequeue the thread (a noop since the run queue doesn't contain the actives threads in SMP) and then mark it suspended (again a noop since it doesn't do anything to stop the thread, albeit a dangerous one since the thread state bits are now inconsistent).

Doing this right requires signaling the other CPU with an IPI, then waiting for the thread to become inactive before returning. That's more or less identical with the existing path for k_thread_abort() (which does support SMP) and could probably share the same infrastructure.

Alternatively, since this never worked, I think I'd personally be OK with just documenting the limitation[1] that trying to suspend an active thread on SMP platforms is illegal, and adding an assertion (or maybe even a panic?) to catch the cases that try.

But getting back to this PR and the bug: I really don't think this is a fix. I think what's happened is that the modified run order is just fooling the test into a state where things come out "success". I think this pushes me to a -1 on this particular change.

[1] As mentioned in other conversations, trying to asynchronously suspend another thread like this is inherently racy. This IMHO isn't an API use case we should be going out of our way trying to support.

andyross · 2023-07-05T16:55:06Z

it will break your lock and run another thread

No, it won't. :)

I consistently make this mistake. In fact z_reschedule() does not affect the preemption logic, you need to requeue a thread to make that happen, which is done in k_yield() but basically nowhere else. So ignore that bit, this is vaguely undesirable from a consistency perspective (because apps are sometimes sensitive to run order when they shouldn't be) but is never a bug per se.

peter-mitsis · 2023-07-05T17:07:59Z

You make a good, solid and understandable case for why the proposed fix was not actually a fix. Thank you! :)

andyross

Pretty sure there's a bug with the state handling stemming from the "thread states can have more than one bit set now" mixup. Also some nitpicking and suggestions.

But on the whole this looks like exactly how I hoped it would work, with a shared path for "stop this other thread on a different CPU so it can inspect scheduler state" that is only minimally different for the "async suspend" and abort cases.

(Though I remain on the record that any app trying to use k_thread_suspend() like this, especially ones on SMP platforms, remains inherently race-prone. Robust unix apps have been warned off of using SIGSTOP for decades too).

include/zephyr/kernel/thread.h

kernel/sched.c

include/zephyr/kernel_structs.h

kernel/sched.c

kernel/thread.c

kernel/sched.c

cfriedt · 2023-08-13T13:35:44Z

I'll give it a review once Andy's comments are addressed.

peter-mitsis · 2023-08-14T17:24:30Z

I'm going to try reworking some of the commits. Based on the comments so far, not only is there is definitely room for improvement, but I think that the current breakdown has added some confusion.

peter-mitsis · 2023-08-16T15:16:05Z

The two CI failures are not due to the proposed changes. I can replicate them locally using twister on the 'main' branch.

peter-mitsis · 2023-08-16T16:16:21Z

(Resolved a merge conflict caused by a recent change to sched.c)

andyross

First patch introduces a deadlock. Beyond that a few continuations of general review that probably shouldn't block merge.

kernel/sched.c

include/zephyr/kernel_structs.h

kernel/sched.c

peter-mitsis · 2023-08-18T00:01:32Z

@andyross - Previously you wrote "Then I guess I'm not understanding what halting means? Why isn't abort a "subclass" of halt behavior, you do all the same stuff and then also terminate yourself?"

Aborting is indeed a sub-type of halting. Halting has two sub-types--aborting and suspending. When we don't care which we are dealing with, we can test with is_halting() which just tests the _THREAD_HALTING bit. When we do care, we test both bits.

peter-mitsis · 2023-08-18T13:20:12Z

@andyross - It came to me last night as I was falling asleep what you were driving at, and what I was missing with the abort/suspend/halt bits. I think this minor tweak should make things more clear.

kernel/sched.c

nashif · 2023-09-19T18:31:00Z

@andyross can you please take another look?

andyross · 2023-09-29T18:36:10Z

include/zephyr/kernel.h

+ * platforms, it is possible for application code to create a deadlock
+ * condition by simultaneously suspending or aborting using at least one
+ * suspension or termination from interrupt context. Zephyr cannot detect all
+ * such conditions.


Wait, what? This is a regression then? The old k_thread_abort() was deadlock-proof. The point to the criticism earlier was that we need to preserve that property, not document it away. Don't have time for a full review at the moment, will leave the tab open and get to this today.

The old k_thread_abort() was deadlock-proof.

No, the old k_thread_abort() was NOT deadlock proof, and the current documentation explicitly states it.

One such scenario wherein a deadlock can occur is if both CPU0 and CPU1 are in ISRs and they each try to abort the other CPU's interrupted thread. As both threads are currently active as they were interrupted, the _THREAD_ABORTING bit is set for both. The next step in this sequence is for both of these CPUs to confirm that the thread they want to abort is marked as _THREAD_ABORTING and that it is not its current/interrupted thread. Since they are both in an ISR, they will each spin waiting for the thread to be aborted--deadlocked. Is this deadlock likely? No. But can it occur? Yes.

As the new proposed k_thread_suspend() shares the same infrastructure as k_thread_abort(), it too is vulnerable to deadlock. Thus, it must be mentioned.

I do have some ideas on how this unlikely deadlock can be prevented, but those are future plans.

No, this is handled. Or should be; "was designed to be", anyway. I'm not going to hold this code up as a model of perfection or anything, but I will say that this (the second attempt to make abort SMP-safe) was excruciatingly hard to get right the first time, which informs my conservatism about this review.

There is a check inside the scheduler lock at the top of the abort path: the first abort-in-a-cycle thread to make it through the abort path with set the aborting flag and spin, the second will see the flag set and synchronously abort the current thread before spinning. That guarantees that both interrupts exit k_thread_abort() in finite time, and the lock guarantees serialization of changes to the aborting flag such that if one CPU sets it the other is guaranteed not to have already check it.

It's possible there's a bug I'm not seeing or a glitch I'm forgetting. But it's not that.

Sorry for the delayed response .. I was sick for several days.

Below is the step by step sequence of how the deadlock I referenced can occur in the existing code.

CPU0 executing thread0, CPU1 executing thread1

CPU0 enters ISR, CPU1 executing thread1

CPU0 ISR doing stuff, CPU1 enters ISR

CPU0 ISR issues k_thread_abort(thread1), CPU1 ISR issues k_thread_abort(thread0)
(Each CPU is now trying to abort the other's interrupted thread)

CPU0 ISR gets sched_lock, CPU1 ISR spins on sched_lock

CPU0 ISR determines thread1 is not aborting, CPU1 still spinning on sched_lock

CPU0 ISR sets thread1's _THREAD_ABORTING, CPU1 still spinning on sched_lock

CPU0 ISR determines that thread1 is aborting and it's not _current, CPU1 still spinning on sched_lock

CPU0 ISR determines it is in ISR, CPU1 still spinning on sched_lock

CPU0 ISR unlocks sched_lock, CPU1 gets sched_lock

CPU0 ISR spins while thread1's _THREAD_ABORTING is set, CPU1 ISR determines thread0 is not aborting

CPU0 ISR continues to spin, CPU1 ISR sets thread0's _THREAD_ABORTING

CPU0 ISR continues to spin, CPU1 ISR determines that thread0 is aborting and it's not _current

CPU0 ISR continues to spin, CPU1 ISR determines that it is in ISR

CPU0 ISR continues to spin, CPU1 ISR unlocks sched_lock

CPU0 ISR continues to spin, CPU1 ISR spins on thread0's _THREAD_ABORTING

** Both CPU0 ISR and CPU1 ISR are deadlocked waiting for threads to abort**

A similar deadlock can still occur in the proposed code (hence the "Zephyr cannot detect all such conditions" comment) except that one of the CPUs can, if I follow things correctly, get stuck in z_sched_switch_spin() because the other is stuck waiting for _THREAD_ABORTING.

I don't follow? Specifically I don't see how you get to step 11 ("CPU1 ISR determines thread0 is not aborting"): that thread will have been affirmatively flagged aborting before the lock is released (step 10).

Really, can we just remove this comment? I'm almost certain it's not correct that we have an unresolvable deadlock in k_sched_abort(). And even if it is correct, we should file that as a bug, write a reproducer and commit to fixing it instead of trying to document that kind of misfeature as a "intended undefined behavior" kind of thing. Other OS's get this right, we should too.

peter-mitsis · 2023-10-05T17:50:08Z

Rebased to resolve a merge conflict from object cores.

andyross · 2023-10-12T20:26:49Z

kernel/sched.c

+			  bool terminate)
+{
+#ifdef CONFIG_SMP
+	if (is_halting(_current) && arch_is_in_isr()) {


Finally found some time to comb through.

This is a race in the current version: checking is_halting() here outside the lock doesn't prevent another CPU from setting it during the code below before the lock is taken, and indeed that can deadlock because now you're spinning on another thread while your current thread is flagged aborting, which is a forbidden state. The old version was inside the lock.

@andyross - Assuming the lock to which you are referring is spinlock sched_lock, then I am not seeing what you are seeing. The routine z_thread_halt() is only called from two places--both of which obtain the lock on sched_lock prior to the routine z_thread_halt() being called. We are therefore inside the lock when we perform the check on is_halting(). And as far as I can tell, the only places that modify either of the _THREAD_HALTING bits also have spinlock sched_lock locked.

If there are still concerns, perhaps we could schedule a time to discuss this (and of course summarise the results here).

@andyross - ping

andyross

Digging in on the deadlock doxygen bit (if you're absolutely certain this is a bug with current code let's file it and fix it, not document it). Also restate a few nitpicks from earlier that would be nice to see fixed but don't necessesarily rise to a -1.

Beyond that you've worn me down and I agree this looks correct. If you're willing to remove that one comment, commiters may feel free to consider my approval given implicitly.

andyross · 2023-10-30T20:36:46Z

include/zephyr/kernel_structs.h

+#define _THREAD_SUSPENDING (BIT(6))
+
+/* Thread is in the process of halting */
+#define _THREAD_HALTING    (_THREAD_ABORTING | _THREAD_SUSPENDING)


Mentioned this long ago, but still don't love this API scheme "_THREAD_HALTING" looks like all the other enumerants, but it's not, it's a multi-bit mask and you can't use it like that. Would prefer to see this "|" operator down inside "is_halting()" and e.g. "set_halting()", which is how other thread state predicates work.

I missed doing that before. Will correct.

andyross · 2023-10-30T20:38:33Z

kernel/sched.c

+ * then it is safe to act immediately.
+ *
+ * @param thread Thread to suspend or abort
+ * @param key Current key for sched_spinlock


Can you more clearly document that this function will release the lock before returning? That was the source of the last round of review mistakery.

andyross · 2023-10-30T20:51:47Z

include/zephyr/kernel.h

+ * platforms, it is possible for application code to create a deadlock
+ * condition by simultaneously suspending or aborting using at least one
+ * suspension or termination from interrupt context. Zephyr cannot detect all
+ * such conditions.


I don't follow? Specifically I don't see how you get to step 11 ("CPU1 ISR determines thread0 is not aborting"): that thread will have been affirmatively flagged aborting before the lock is released (step 10).

Really, can we just remove this comment? I'm almost certain it's not correct that we have an unresolvable deadlock in k_sched_abort(). And even if it is correct, we should file that as a bug, write a reproducer and commit to fixing it instead of trying to document that kind of misfeature as a "intended undefined behavior" kind of thing. Other OS's get this right, we should too.

peter-mitsis · 2023-10-31T16:15:43Z

Really, can we just remove this comment? I'm almost certain it's not correct that we have an unresolvable deadlock in k_sched_abort(). And even if it is correct, we should file that as a bug, write a reproducer and commit to fixing it instead of trying to document that kind of misfeature as a "intended undefined behavior" kind of thing. Other OS's get this right, we should too.

I will remove the comment from the doxygen. Issue #64646 has been filed to track this issue (assuming that it is real).

The halt queue will be used to identify threads that are waiting for a thread on another CPU to finish suspending. Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>

The routine halt_thread() acts nearly identical to end_thread() except that instead of only halting the thread if the _THREAD_DEAD state bit is not set, it will halt it if bit specified by the parameter new_state is not set (which is always _THREAD_DEAD). Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>

Extracts the essential thread synchronization logic when aborting a thread from z_thread_abort() and moves it to its own routine called z_thread_halt(). Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>

Extends the concept of halting a thread from just aborting a thread to both aborting and suspending a thread. Part of this involves updating k_thread_suspend() to operate in a similar fashion to that of k_thread_abort(). Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>

Updates k_thread_state_str() to interpret the halting bits correctly. Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>

peter-mitsis · 2023-10-31T17:16:54Z

This latest push should address the most recent comments.

andyross

cfriedt

It looks reasonable enough to me - do we have test coverage everywhere?

peter-mitsis requested review from dcpleung, nashif and andyross as code owners June 19, 2023 19:27

zephyrbot added the area: Kernel label Jun 19, 2023

zephyrbot requested a review from ceolin June 19, 2023 19:27

zephyrbot assigned andyross Jun 19, 2023

andyross requested changes Jul 5, 2023

View reviewed changes

andyross mentioned this pull request Jul 5, 2023

Zephyr kernel deadlocks in thread metric benchmark for SMP=2 #58392

Closed

peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from 74b5ade to d1a651a Compare August 11, 2023 14:59

peter-mitsis requested review from carlescufi, galak and MaureenHelm as code owners August 11, 2023 14:59

zephyrbot requested a review from cfriedt August 11, 2023 15:00

peter-mitsis mentioned this pull request Aug 11, 2023

kernel: Simplify z_thread_abort() #61178

Closed

andyross reviewed Aug 12, 2023

View reviewed changes

peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from d1a651a to 54533ad Compare August 16, 2023 13:10

peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from 54533ad to 30cd286 Compare August 16, 2023 15:24

andyross requested changes Aug 16, 2023

View reviewed changes

kernel/sched.c Outdated Show resolved Hide resolved

kernel/sched.c Show resolved Hide resolved

include/zephyr/kernel_structs.h Outdated Show resolved Hide resolved

kernel/sched.c Outdated Show resolved Hide resolved

peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from 30cd286 to b092a4f Compare August 17, 2023 19:55

peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from b092a4f to df2ce17 Compare August 18, 2023 13:18

peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from df2ce17 to d28ba21 Compare August 24, 2023 03:46

andyross reviewed Aug 24, 2023

View reviewed changes

kernel/sched.c Outdated Show resolved Hide resolved

peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from d28ba21 to c493dbc Compare August 27, 2023 15:48

peter-mitsis requested a review from andyross September 18, 2023 17:46

andyross reviewed Sep 29, 2023

View reviewed changes

peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from c493dbc to 4e5e44a Compare October 5, 2023 17:49

jhedberg previously approved these changes Oct 9, 2023

View reviewed changes

andyross reviewed Oct 12, 2023

View reviewed changes

nashif requested a review from andyross October 30, 2023 14:07

andyross reviewed Oct 30, 2023

View reviewed changes

andyross self-requested a review October 30, 2023 21:11

peter-mitsis mentioned this pull request Oct 31, 2023

k_thread_abort() can deadlock on SMP systems #64646

Closed

peter-mitsis added 5 commits October 31, 2023 12:52

kernel: Add halt_queue field to k_thread

7d5f6e2

The halt queue will be used to identify threads that are waiting for a thread on another CPU to finish suspending. Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>

kernel: Create z_thread_halt()

b9ce652

Extracts the essential thread synchronization logic when aborting a thread from z_thread_abort() and moves it to its own routine called z_thread_halt(). Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>

kernel: Update k_thread_state_str()

92b27c8

Updates k_thread_state_str() to interpret the halting bits correctly. Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>

peter-mitsis dismissed jhedberg’s stale review via 92b27c8 October 31, 2023 17:16

peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from 4e5e44a to 92b27c8 Compare October 31, 2023 17:16

andyross approved these changes Oct 31, 2023

View reviewed changes

peter-mitsis requested a review from jhedberg November 6, 2023 20:08

nashif approved these changes Nov 6, 2023

View reviewed changes

jhedberg approved these changes Nov 6, 2023

View reviewed changes

cfriedt approved these changes Nov 6, 2023

View reviewed changes

nashif merged commit 9364ba4 into zephyrproject-rtos:main Nov 6, 2023
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel: Change locking in k_thread_suspend() #59400

kernel: Change locking in k_thread_suspend() #59400

peter-mitsis commented Jun 19, 2023 •

edited

Loading

andyross commented Jul 5, 2023

peter-mitsis commented Jul 5, 2023

andyross left a comment

andyross commented Jul 5, 2023

peter-mitsis commented Jul 5, 2023

andyross left a comment

cfriedt commented Aug 13, 2023

peter-mitsis commented Aug 14, 2023

peter-mitsis commented Aug 16, 2023

peter-mitsis commented Aug 16, 2023

andyross left a comment

peter-mitsis commented Aug 18, 2023

peter-mitsis commented Aug 18, 2023

nashif commented Sep 19, 2023

andyross Sep 29, 2023

peter-mitsis Sep 29, 2023

andyross Oct 11, 2023

peter-mitsis Oct 17, 2023 •

edited

Loading

andyross Oct 30, 2023

peter-mitsis commented Oct 5, 2023

andyross Oct 12, 2023

peter-mitsis Oct 17, 2023

peter-mitsis Oct 24, 2023

andyross left a comment

andyross Oct 30, 2023

peter-mitsis Oct 31, 2023

andyross Oct 30, 2023

peter-mitsis Oct 31, 2023

andyross Oct 30, 2023

peter-mitsis commented Oct 31, 2023 •

edited

Loading

peter-mitsis commented Oct 31, 2023

andyross left a comment

cfriedt left a comment

kernel: Change locking in k_thread_suspend() #59400

kernel: Change locking in k_thread_suspend() #59400

Conversation

peter-mitsis commented Jun 19, 2023 • edited Loading

andyross commented Jul 5, 2023

peter-mitsis commented Jul 5, 2023

andyross left a comment

Choose a reason for hiding this comment

andyross commented Jul 5, 2023

peter-mitsis commented Jul 5, 2023

andyross left a comment

Choose a reason for hiding this comment

cfriedt commented Aug 13, 2023

peter-mitsis commented Aug 14, 2023

peter-mitsis commented Aug 16, 2023

peter-mitsis commented Aug 16, 2023

andyross left a comment

Choose a reason for hiding this comment

peter-mitsis commented Aug 18, 2023

peter-mitsis commented Aug 18, 2023

nashif commented Sep 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peter-mitsis Oct 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peter-mitsis commented Oct 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyross left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peter-mitsis commented Oct 31, 2023 • edited Loading

peter-mitsis commented Oct 31, 2023

andyross left a comment

Choose a reason for hiding this comment

cfriedt left a comment

Choose a reason for hiding this comment

peter-mitsis commented Jun 19, 2023 •

edited

Loading

peter-mitsis Oct 17, 2023 •

edited

Loading

peter-mitsis commented Oct 31, 2023 •

edited

Loading