PXC-4500: When innodb_thread_concurrency is set, the cluster can get stuck during SST #1963
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
https://perconadev.atlassian.net/browse/PXC-4500
Problem 1:
When innodb_thread_concurrency is set, and the node has a heavy user workload, it can become stuck when it receives a CC event from Galera.
Cause:
Let's say innodb_thread_concurrency = 2.
The above results in a deadlock:
Solution:
The wrong condition was used to detect the wsrep applier thread in innobase_srv_conc_enter_innodb(). The applier thread should always be granted access. Fixed.
Problem 2:
Even after fixing Problem 1, the cluster was stuck during SST.
Cause:
The SST thread creates an SST user. For this, it needs to enter InnoDB. We end up in a similar situation as in Problem 1. The applier thread holds LocalMonitor and waits for SST to finish. The SST thread waits for InnoDB. User threads hold the InnoDB lock and wait for LocalMonitor.
Solution:
Allow SST thread to enter InnoDB always.