Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix freeze of worker #53

Conversation

michaelsippel
Copy link
Member

@michaelsippel michaelsippel commented Dec 14, 2023

A race condition on the cv condition-variable in Worker caused a freeze in the WorkerUtilization-unittest.

This freeze happens because of two sequential calls to wait() without proper synchronization of notify() in the case when the worker is assigned a new task before the worker starts its work-loop. There, first the Worker must progress to the start-barrier and enter the outer while-condition to wait for the start signal. Before wait() returns, both Worker::start() and the task-emplace sent their notify(), but the notify() corresponding to the emplacement of the new task is lost in ambiguity with the start-signal.
Then the worker thread will wake up and jump to to work_loop() where it will wait again but the notify() which should wake the worker up was already sent and thus the worker will not wake up and the task will not be consumed.

In a 'real' scenario this might not be as apparent, because the worker will only be falsely inactive until the next task is emplaced which will wake the worker up and everything will continue normally.

This PR fixes this bug by eliminating the CondVar-based barrier before work_loop() which is also not required anymore because of recent refactorings.

…tup of worker

fixes worker freeze from two sequential waits when a task is emplaced right before the worker calls `wait()` in the start-barrier
@psychocoderHPC psychocoderHPC merged commit cfe32f0 into ComputationalRadiationPhysics:dev Dec 15, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants