Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: Change locking in k_thread_suspend() #59400

Merged

Conversation

peter-mitsis
Copy link
Collaborator

@peter-mitsis peter-mitsis commented Jun 19, 2023

Updates k_thread_suspend() to work on SMP systems. It functions much like k_thread_abort().

If the target thread is running on another CPU, the target thread gets marked as "halting with a reason". Meanwhile the calling thread waits for the target thread to finish halting (suspend or abort). Note that the reason bit has been implemented such that an abort takes precedence over a suspend to cover the case a thread running on another CPU gets suspended and then aborted (or vice versa). If the target thread is not running on another CPU, then the target thread can be immediately aborted/suspended.

Fixes #58392

@andyross
Copy link
Contributor

andyross commented Jul 5, 2023

Sorry, missed this earlier. I guess I don't get the point here? This is a behavior change: the call used to reschedule unconditionally if you suspended yourself (i.e. in the situation where we know it's required) and not if you suspended someone else. Now it reschedules every time, which means that it acts as a yield point where it didn't before.

Personally I'm agnostic on that sort of thing, but there has been argument in the past about this sort of behavior.

But regardless: I don't see why this is a good idea? Suspending another thread isn't an obvious spot where a user would expect a yield (because obviously we're the highest priority thread right now, so we can't be suspending anyone else running and thus the run choice shouldn't change on our local CPU).

My guess is that if there's a deadlock in the linked test that is fixed here, it's that the test is incorrectly sensitive to thread run order and needs better synchronization internally. Effectively all this is doing is adding a k_yield() after the suspend, it's not fixing anything.

@peter-mitsis
Copy link
Collaborator Author

On a UP system, the old code works great. In such a system it makes perfect sense to only reschedule if the thread is attempting to suspend itself as the currently executing thread would by definition be the highest priority thread that is ready to run. On an SMP system, that assumption no longer holds. This means that on an SMP system, we must perform a reschedule as the thread that is being suspended could be executing on another core.

This in turn leads to the following choice--we can either have distinct logic for UP and SMP cases, or we can use the same logic for both. I opted for the same logic for both to keep things simple (and hopefully easier to maintain).

What are the penalties for unconditionally calling z_reschedule() in the UP case? Unless I am missing something, the only penalty should be a little extra time as it runs through the logic to determine whether or not to actually schedule in a new thread. Is that extra time problematic? I don't think so.

What about the scheduling point? Again, unless I am missing something, calling z_reschedule() should not result in any scheduling difference in UP.

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I'm seeing this. This is deep, unfortunately. Bear with me:

What are the penalties for unconditionally calling z_reschedule() in the UP case?

If you're in a cooperative thread or holding a k_sched_lock(), it will break your lock and run another thread after this change, where it wouldn't before (or it would, but only in the case where you suspended yourself and would obviously expect/desire that behavior). That's the source of previous arguments about scheduling points, mostly. Our API is sensitive to this in ways that other OSes aren't, because we make promises they don't. (The counterargument, FWIW, is that we've been historically bad at documenting where the scheduling points are, so they shouldn't count as a stable API).

On an SMP system, that assumption no longer holds.

Sure it does. The thread being suspended isn't running, so by definition suspending it isn't going to affect the run queue anywhere else either.

Except, as you point out, what if it is actually running on another CPU?

And here we're just falling down. k_thread_suspend() simply isn't handling the case in SMP where you try to suspend an active thread. It doesn't work, and it's never worked. As I read it, all it would do is attempt to dequeue the thread (a noop since the run queue doesn't contain the actives threads in SMP) and then mark it suspended (again a noop since it doesn't do anything to stop the thread, albeit a dangerous one since the thread state bits are now inconsistent).

Doing this right requires signaling the other CPU with an IPI, then waiting for the thread to become inactive before returning. That's more or less identical with the existing path for k_thread_abort() (which does support SMP) and could probably share the same infrastructure.

Alternatively, since this never worked, I think I'd personally be OK with just documenting the limitation[1] that trying to suspend an active thread on SMP platforms is illegal, and adding an assertion (or maybe even a panic?) to catch the cases that try.

But getting back to this PR and the bug: I really don't think this is a fix. I think what's happened is that the modified run order is just fooling the test into a state where things come out "success". I think this pushes me to a -1 on this particular change.

[1] As mentioned in other conversations, trying to asynchronously suspend another thread like this is inherently racy. This IMHO isn't an API use case we should be going out of our way trying to support.

@andyross
Copy link
Contributor

andyross commented Jul 5, 2023

it will break your lock and run another thread

No, it won't. :)

I consistently make this mistake. In fact z_reschedule() does not affect the preemption logic, you need to requeue a thread to make that happen, which is done in k_yield() but basically nowhere else. So ignore that bit, this is vaguely undesirable from a consistency perspective (because apps are sometimes sensitive to run order when they shouldn't be) but is never a bug per se.

@peter-mitsis
Copy link
Collaborator Author

You make a good, solid and understandable case for why the proposed fix was not actually a fix. Thank you! :)

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure there's a bug with the state handling stemming from the "thread states can have more than one bit set now" mixup. Also some nitpicking and suggestions.

But on the whole this looks like exactly how I hoped it would work, with a shared path for "stop this other thread on a different CPU so it can inspect scheduler state" that is only minimally different for the "async suspend" and abort cases.

(Though I remain on the record that any app trying to use k_thread_suspend() like this, especially ones on SMP platforms, remains inherently race-prone. Robust unix apps have been warned off of using SIGSTOP for decades too).

include/zephyr/kernel/thread.h Outdated Show resolved Hide resolved
kernel/sched.c Outdated Show resolved Hide resolved
include/zephyr/kernel_structs.h Outdated Show resolved Hide resolved
kernel/sched.c Outdated Show resolved Hide resolved
kernel/sched.c Outdated Show resolved Hide resolved
kernel/sched.c Outdated Show resolved Hide resolved
kernel/thread.c Outdated Show resolved Hide resolved
kernel/sched.c Show resolved Hide resolved
@cfriedt
Copy link
Member

cfriedt commented Aug 13, 2023

I'll give it a review once Andy's comments are addressed.

@peter-mitsis
Copy link
Collaborator Author

I'm going to try reworking some of the commits. Based on the comments so far, not only is there is definitely room for improvement, but I think that the current breakdown has added some confusion.

@peter-mitsis peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from d1a651a to 54533ad Compare August 16, 2023 13:10
@peter-mitsis
Copy link
Collaborator Author

The two CI failures are not due to the proposed changes. I can replicate them locally using twister on the 'main' branch.

@peter-mitsis peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from 54533ad to 30cd286 Compare August 16, 2023 15:24
@peter-mitsis
Copy link
Collaborator Author

(Resolved a merge conflict caused by a recent change to sched.c)

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First patch introduces a deadlock. Beyond that a few continuations of general review that probably shouldn't block merge.

kernel/sched.c Outdated Show resolved Hide resolved
kernel/sched.c Show resolved Hide resolved
include/zephyr/kernel_structs.h Outdated Show resolved Hide resolved
kernel/sched.c Outdated Show resolved Hide resolved
@peter-mitsis peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from 30cd286 to b092a4f Compare August 17, 2023 19:55
@peter-mitsis
Copy link
Collaborator Author

@andyross - Previously you wrote "Then I guess I'm not understanding what halting means? Why isn't abort a "subclass" of halt behavior, you do all the same stuff and then also terminate yourself?"

Aborting is indeed a sub-type of halting. Halting has two sub-types--aborting and suspending. When we don't care which we are dealing with, we can test with is_halting() which just tests the _THREAD_HALTING bit. When we do care, we test both bits.

@peter-mitsis peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from b092a4f to df2ce17 Compare August 18, 2023 13:18
@peter-mitsis
Copy link
Collaborator Author

@andyross - It came to me last night as I was falling asleep what you were driving at, and what I was missing with the abort/suspend/halt bits. I think this minor tweak should make things more clear.

@peter-mitsis peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from df2ce17 to d28ba21 Compare August 24, 2023 03:46
kernel/sched.c Outdated Show resolved Hide resolved
@nashif
Copy link
Member

nashif commented Sep 19, 2023

@andyross can you please take another look?

* platforms, it is possible for application code to create a deadlock
* condition by simultaneously suspending or aborting using at least one
* suspension or termination from interrupt context. Zephyr cannot detect all
* such conditions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, what? This is a regression then? The old k_thread_abort() was deadlock-proof. The point to the criticism earlier was that we need to preserve that property, not document it away. Don't have time for a full review at the moment, will leave the tab open and get to this today.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old k_thread_abort() was deadlock-proof.

No, the old k_thread_abort() was NOT deadlock proof, and the current documentation explicitly states it.

One such scenario wherein a deadlock can occur is if both CPU0 and CPU1 are in ISRs and they each try to abort the other CPU's interrupted thread. As both threads are currently active as they were interrupted, the _THREAD_ABORTING bit is set for both. The next step in this sequence is for both of these CPUs to confirm that the thread they want to abort is marked as _THREAD_ABORTING and that it is not its current/interrupted thread. Since they are both in an ISR, they will each spin waiting for the thread to be aborted--deadlocked. Is this deadlock likely? No. But can it occur? Yes.

As the new proposed k_thread_suspend() shares the same infrastructure as k_thread_abort(), it too is vulnerable to deadlock. Thus, it must be mentioned.

I do have some ideas on how this unlikely deadlock can be prevented, but those are future plans.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is handled. Or should be; "was designed to be", anyway. I'm not going to hold this code up as a model of perfection or anything, but I will say that this (the second attempt to make abort SMP-safe) was excruciatingly hard to get right the first time, which informs my conservatism about this review.

There is a check inside the scheduler lock at the top of the abort path: the first abort-in-a-cycle thread to make it through the abort path with set the aborting flag and spin, the second will see the flag set and synchronously abort the current thread before spinning. That guarantees that both interrupts exit k_thread_abort() in finite time, and the lock guarantees serialization of changes to the aborting flag such that if one CPU sets it the other is guaranteed not to have already check it.

It's possible there's a bug I'm not seeing or a glitch I'm forgetting. But it's not that.

Copy link
Collaborator Author

@peter-mitsis peter-mitsis Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed response .. I was sick for several days.

Below is the step by step sequence of how the deadlock I referenced can occur in the existing code.

  1. CPU0 executing thread0, CPU1 executing thread1
  2. CPU0 enters ISR, CPU1 executing thread1
  3. CPU0 ISR doing stuff, CPU1 enters ISR
  4. CPU0 ISR issues k_thread_abort(thread1), CPU1 ISR issues k_thread_abort(thread0)
    (Each CPU is now trying to abort the other's interrupted thread)
  5. CPU0 ISR gets sched_lock, CPU1 ISR spins on sched_lock
  6. CPU0 ISR determines thread1 is not aborting, CPU1 still spinning on sched_lock
  7. CPU0 ISR sets thread1's _THREAD_ABORTING, CPU1 still spinning on sched_lock
  8. CPU0 ISR determines that thread1 is aborting and it's not _current, CPU1 still spinning on sched_lock
  9. CPU0 ISR determines it is in ISR, CPU1 still spinning on sched_lock
  10. CPU0 ISR unlocks sched_lock, CPU1 gets sched_lock
  11. CPU0 ISR spins while thread1's _THREAD_ABORTING is set, CPU1 ISR determines thread0 is not aborting
  12. CPU0 ISR continues to spin, CPU1 ISR sets thread0's _THREAD_ABORTING
  13. CPU0 ISR continues to spin, CPU1 ISR determines that thread0 is aborting and it's not _current
  14. CPU0 ISR continues to spin, CPU1 ISR determines that it is in ISR
  15. CPU0 ISR continues to spin, CPU1 ISR unlocks sched_lock
  16. CPU0 ISR continues to spin, CPU1 ISR spins on thread0's _THREAD_ABORTING
  17. ** Both CPU0 ISR and CPU1 ISR are deadlocked waiting for threads to abort**

A similar deadlock can still occur in the proposed code (hence the "Zephyr cannot detect all such conditions" comment) except that one of the CPUs can, if I follow things correctly, get stuck in z_sched_switch_spin() because the other is stuck waiting for _THREAD_ABORTING.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow? Specifically I don't see how you get to step 11 ("CPU1 ISR determines thread0 is not aborting"): that thread will have been affirmatively flagged aborting before the lock is released (step 10).

Really, can we just remove this comment? I'm almost certain it's not correct that we have an unresolvable deadlock in k_sched_abort(). And even if it is correct, we should file that as a bug, write a reproducer and commit to fixing it instead of trying to document that kind of misfeature as a "intended undefined behavior" kind of thing. Other OS's get this right, we should too.

@peter-mitsis peter-mitsis force-pushed the pmitsis-suspend-resume-deadlock branch from c493dbc to 4e5e44a Compare October 5, 2023 17:49
@peter-mitsis
Copy link
Collaborator Author

Rebased to resolve a merge conflict from object cores.

jhedberg
jhedberg previously approved these changes Oct 9, 2023
bool terminate)
{
#ifdef CONFIG_SMP
if (is_halting(_current) && arch_is_in_isr()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally found some time to comb through.

This is a race in the current version: checking is_halting() here outside the lock doesn't prevent another CPU from setting it during the code below before the lock is taken, and indeed that can deadlock because now you're spinning on another thread while your current thread is flagged aborting, which is a forbidden state. The old version was inside the lock.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyross - Assuming the lock to which you are referring is spinlock sched_lock, then I am not seeing what you are seeing. The routine z_thread_halt() is only called from two places--both of which obtain the lock on sched_lock prior to the routine z_thread_halt() being called. We are therefore inside the lock when we perform the check on is_halting(). And as far as I can tell, the only places that modify either of the _THREAD_HALTING bits also have spinlock sched_lock locked.

If there are still concerns, perhaps we could schedule a time to discuss this (and of course summarise the results here).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyross - ping

@nashif nashif requested a review from andyross October 30, 2023 14:07
Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging in on the deadlock doxygen bit (if you're absolutely certain this is a bug with current code let's file it and fix it, not document it). Also restate a few nitpicks from earlier that would be nice to see fixed but don't necessesarily rise to a -1.

Beyond that you've worn me down and I agree this looks correct. If you're willing to remove that one comment, commiters may feel free to consider my approval given implicitly.

#define _THREAD_SUSPENDING (BIT(6))

/* Thread is in the process of halting */
#define _THREAD_HALTING (_THREAD_ABORTING | _THREAD_SUSPENDING)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned this long ago, but still don't love this API scheme "_THREAD_HALTING" looks like all the other enumerants, but it's not, it's a multi-bit mask and you can't use it like that. Would prefer to see this "|" operator down inside "is_halting()" and e.g. "set_halting()", which is how other thread state predicates work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed doing that before. Will correct.

* then it is safe to act immediately.
*
* @param thread Thread to suspend or abort
* @param key Current key for sched_spinlock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you more clearly document that this function will release the lock before returning? That was the source of the last round of review mistakery.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

* platforms, it is possible for application code to create a deadlock
* condition by simultaneously suspending or aborting using at least one
* suspension or termination from interrupt context. Zephyr cannot detect all
* such conditions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow? Specifically I don't see how you get to step 11 ("CPU1 ISR determines thread0 is not aborting"): that thread will have been affirmatively flagged aborting before the lock is released (step 10).

Really, can we just remove this comment? I'm almost certain it's not correct that we have an unresolvable deadlock in k_sched_abort(). And even if it is correct, we should file that as a bug, write a reproducer and commit to fixing it instead of trying to document that kind of misfeature as a "intended undefined behavior" kind of thing. Other OS's get this right, we should too.

@peter-mitsis
Copy link
Collaborator Author

peter-mitsis commented Oct 31, 2023

Really, can we just remove this comment? I'm almost certain it's not correct that we have an unresolvable deadlock in k_sched_abort(). And even if it is correct, we should file that as a bug, write a reproducer and commit to fixing it instead of trying to document that kind of misfeature as a "intended undefined behavior" kind of thing. Other OS's get this right, we should too.

I will remove the comment from the doxygen. Issue #64646 has been filed to track this issue (assuming that it is real).

The halt queue will be used to identify threads that are waiting
for a thread on another CPU to finish suspending.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
The routine halt_thread() acts nearly identical to end_thread()
except that instead of only halting the thread if the _THREAD_DEAD
state bit is not set, it will halt it if bit specified by the
parameter new_state is not set (which is always _THREAD_DEAD).

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Extracts the essential thread synchronization logic when aborting
a thread from z_thread_abort() and moves it to its own routine
called z_thread_halt().

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Extends the concept of halting a thread from just aborting a thread
to both aborting and suspending a thread.

Part of this involves updating k_thread_suspend() to operate in a
similar fashion to that of k_thread_abort().

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Updates k_thread_state_str() to interpret the halting bits
correctly.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
@peter-mitsis
Copy link
Collaborator Author

This latest push should address the most recent comments.

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<Slaps a +1 on it.>

<Runs away.>

Copy link
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks reasonable enough to me - do we have test coverage everywhere?

@nashif nashif merged commit 9364ba4 into zephyrproject-rtos:main Nov 6, 2023
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zephyr kernel deadlocks in thread metric benchmark for SMP=2
6 participants