-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition in ThreadJob.waitForRun #659 #661
Conversation
b0c619b
to
b176418
Compare
b176418
to
c641bf0
Compare
c641bf0
to
4e8c24f
Compare
4e8c24f
to
b9d62d0
Compare
Would be nice if @jukzi could check this, he recently was involved in Job related changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while it might be an improvement it does not seem to totally fix the issue.
...ore.tests.resources/src/org/eclipse/core/tests/internal/builders/ParallelBuildChainTest.java
Show resolved
Hide resolved
runtime/bundles/org.eclipse.core.jobs/src/org/eclipse/core/internal/jobs/ThreadJob.java
Show resolved
Hide resolved
@jukzi Thanks for all the investigation. The issue you found with a large number of jobs seems to be an (already existing) performance issue (see #661 (comment)). While simply changing Still it might make sense to replace the unconditional Do you still have objections to the proposed change? |
I wonder why i don't see the performance benefit from checking again over wait(250) in the junit test. If there is not measurable impact, then i suggest to make the code less complex over adding additional logic. |
b9d62d0
to
02fabca
Compare
You won't see a performance impact in the test for two reasons:
I see the point that using a timeout makes the code simpler while achieving the same result in terms of performance. However, I see three arguments speaking against that solution:
That's why I am not in favor of a "simple" solution here. Still, I improved the code by directly checking the jobs for conflicts instead of their rules iun 02fabca. |
You could add a note to the commit somewhere that it is (for example?) yieldRule() which triggered the deadlock. |
5dc0af4
to
af5395a
Compare
Good idea. I have adapted the code comment to reflect why this can happen and added an explanation to the commit message in af5395a. I incorporated the insights concerning the reasons for a change of the scheduling rule in #661 (comment). |
ThreadJob.waitForRun() suffers from a race condition. Between checking for a blocking job with a conflicting scheduling rule and starting to wait on the state lock of that blocking job to get notified when the scheduling rule is released, the blocking job may have already changed its state and acquired a different scheduling rule, such that no conflict exists anymore. Since this condition is not reevaluated but waiting on the lock object is started without a timeout, the blocked job unnecessarily waits for the initially blocking job to change its state again, i.e., to finish its execution. One relevant situation in which the scheduling rule of a job can change is if the job is an internally used ThreadJob. They are started whenever a rule is acquired on the JobManager and are reused for the same thread. When the same thread acquires multiple rules, one after another, the rule of the ThreadJob may effectively change. This, for example, happens during a workspace build operation, which changes the rule for two times. Thus, if the blocking job in ThreadJob.waitForRun() is a ThreadJob, it may change its rule, in particular when that job performs a build operation. With this change, the scheduling rules of blocking and blocked thread are checked for conflicts again before starting to wait on the state lock in the blocked job. This leads to a reevaluation of the block condition in case the scheduling rule of the blocking job has changed in between and avoids the job to unnecessarily block until the blocking job finishes. Fixes eclipse-platform#659.
af5395a
to
2b90917
Compare
ThreadJob.waitForRun()
suffers from a race condition. Between checking for a blocking job with a conflicting scheduling rule and starting to wait on the state lock of that blocking job to get notified when the scheduling rule is released, the blocking job may have already changed its state and acquired a different scheduling rule, such that no conflict exists anymore. Since this condition is not reevaluated but waiting on the lock object is started without a timeout, the blocked job unnecessarily waits for the initially blocking job to change its state again, i.e., to finish its execution.One relevant situation in which the scheduling rule of a job can change is if the job is an internally used
ThreadJob
. They are started whenever a rule is acquired on theJobManager
and are reused for the same thread. When the same thread acquires multiple rules, one after another, the rule of theThreadJob
may effectively change. This, for example, happens during a workspace build operation, which changes the rule for two times. Thus, if the blocking job inThreadJob.waitForRun()
is aThreadJob
, it may change its rule, in particular when that job performs a build operation.With this change, the scheduling rules of blocking and blocked thread are checked for conflicts again before starting to wait on the state lock in the blocked job. This leads to a reevaluation of the block condition in case the scheduling rule of the blocking job has changed in between and avoids the job to unnecessarily block until the blocking job finishes.
Fixes #659.