-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race conditions in poll and condvar #62691
Conversation
The signal_poll_event function was previously called without the poll lock held. This created a race condition between a thread calling k_poll to wait for an event and another thread signalling for this same event. This resulted in the waiting thread to stay pending and the handle to it getting removed from the notifyq, meaning it couldn't get woken up again. Signed-off-by: Ambroise Vincent <ambroise.vincent@arm.com>
Releasing the lock before notifying condvar led to a race condition between a thread calling k_condvar_wait to wait for a condition variable and another thread signalling for this same condition variable. This resulted in the waiting thread to stay pending and the handle to it getting removed from the notifyq, meaning it couldn't get woken up again. Signed-off-by: Ambroise Vincent <ambroise.vincent@arm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just wondering why the naming of z_handle_obj_poll_events
looks so different from others in kernel/poll.c.
@peter-mitsis @andyross please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question. If you're already done the analysis and know it's safe, feel free to treat your "yes" as a +1 from me.
@@ -470,11 +471,14 @@ static int signal_poll_event(struct k_poll_event *event, uint32_t state) | |||
void z_handle_obj_poll_events(sys_dlist_t *events, uint32_t state) | |||
{ | |||
struct k_poll_event *poll_event; | |||
k_spinlock_key_t key = k_spin_lock(&lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely looks like a lock was missing here, but I do worry about the possibility of a deadlock underneath signal_poll_event, which calls into a bunch of subsystems. Are we 100% sure those can't turn around and call back into something that wants the poll lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. But given that z_impl_k_poll_signal_raise
already calls signal_poll_event
with the poll lock taken, I suppose such a case would have already shown up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sold.
In the poll code,
signal_poll_event
is called by 2 functions:z_impl_k_poll_signal_raise
andz_handle_obj_poll_events
. The first one correctly takes the poll lock. Fix the second one not taking the lock.In the sockets code, fix the mutex of the condition variable getting released before signaling for this same condition variable.
Both of those created race conditions which especially appeared on SMP-enabled configurations. The symptom was similar for both of them: a thread pending, waiting on an event, but never getting scheduled again even though the event happened.