-
Notifications
You must be signed in to change notification settings - Fork 149
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PXC-4173: PXC node stalls with parallel replication workers executing…
… DDLs via async node https://perconadev.atlassian.net/browse/PXC-4173 Problem ======= PXC replica could enter into a deadlock with multi threaded replication when --replica-preserve-commit-order is enabled. The deadlock is more likely to occur on a replica with --replica-preserve-commit-order enabled when the source server's workload consists of concurrent executions of DMLs and DDLs. In such a scenario, the replica shall try to commit them in the same order as that of the source which will conflict with galera's ordering resulting in a deadlock. MySQL ordering: Commits are serialized as per their commit order on source server. Galera ordering: Commits are serialized in a FIFO manner. Scenario ======== Let's assume we have the following two transactions in the relay log with same last_committed. T1: DML, seqno=10 T2: DDL, seqno=11 +-----------+--------------+-------------------------------------------------------+ | Timestamp | Thread | Activity | +-----------+--------------+-------------------------------------------------------+ | t0 | Co-ordinator | Assigns T1 to Applier-1 | +-----------+--------------+-------------------------------------------------------+ | t1 | Co-ordinator | Assigns T2 to Applier-2 | +-----------+--------------+-------------------------------------------------------+ | t2 | Applier-1 | DML starts with seqno = 10 | +-----------+--------------+-------------------------------------------------------+ | t3 | Applier-2 | DDL starts with seqno = 11 | +-----------+--------------+-------------------------------------------------------+ | t4 | Applier-2 | DDL calls TOI_begin(), acquires commit monitor | +-----------+--------------+-------------------------------------------------------+ | t5 | Applier-2 | DDL executes | +-----------+--------------+-------------------------------------------------------+ | t5 | Applier-1 | DML executes | +-----------+--------------+-------------------------------------------------------+ | t6 | Applier-2 | DDL reaches commit_order_manager, finds that | | | | it needs to wait until Applier-1 is committed | +-----------+--------------+-------------------------------------------------------+ | t7 | Applier-1 | DML reaches commit, calls commit_order_enter_local() | | | | and will wait for Applier-2 to release commit monitor | +-----------+--------------+-------------------------------------------------------+ In the end it will result in Applier-1: DML executing Xid_log_event::commit waiting for DDL to release commit order Applier-2: DDL executing ALTER TABLE waiting for applier-1 to commit first Solution ======== This commit introduces "Async Monitor" at server layer to ensure that the ordering in galera is done in the same order in the relay log. Async Monitor is similar to Commit Monitor used in galera which internally keeps track of the last_left seqno. If a transaction with seqno > last_left, then such transactions shall wait until the condition `seqno = last_left + 1` is met, meaning it is its turn to enter the Commit Monitor in galera. This ensures that transactions are registered in galera is the same sequence as on the source. Lifetime of the Async Monitor: Async Monitor is created on START REPLICA and is destroyed on STOP REPLICA, and whenever applier thread exits. Before it is destroyed, the value of the last_left is stored in the Relay_log_info object and the value is restored on subsequent START SLAVE.
- Loading branch information
1 parent
d29a325
commit 63a4df8
Showing
11 changed files
with
241 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
#ifdef WITH_WSREP | ||
#include "sql/wsrep_async_monitor.h" | ||
#include <algorithm> | ||
#include <cassert> | ||
|
||
void Wsrep_async_monitor::enter(seqno_t seqno) { | ||
std::unique_lock<std::mutex> lock(m_mutex); | ||
|
||
// Wait for its turn before entering | ||
m_cond.wait(lock, [this, seqno]() { return seqno == m_last_left + 1; }); | ||
m_last_entered = (seqno > m_last_entered) ? seqno : m_last_entered; | ||
fprintf(stderr, "Entered the monitor with seqno: %llu\n", seqno); | ||
} | ||
|
||
void Wsrep_async_monitor::leave(seqno_t seqno) { | ||
// Wait for its turn before leaving | ||
std::unique_lock<std::mutex> lock(m_mutex); | ||
m_cond.wait(lock, [this, seqno]() { return seqno == m_last_left + 1; }); | ||
m_last_left = seqno; | ||
|
||
fprintf(stderr, "Left the monitor seqno: %llu\n", seqno); | ||
// Notify all waiting threads | ||
m_cond.notify_all(); | ||
} | ||
|
||
void Wsrep_async_monitor::reset(seqno_t seqno) { | ||
std::unique_lock<std::mutex> lock(m_mutex); | ||
|
||
// We can reset only if the last entered and last left | ||
// are same, meaning that there is no one who is inside | ||
// the monitor | ||
assert(m_last_entered == m_last_left); | ||
m_last_entered = seqno; | ||
m_last_left = seqno; | ||
m_cond.notify_all(); | ||
} | ||
|
||
#endif /* WITH_WSREP */ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#ifndef WSREP_ASYNC_MONITOR_H | ||
#define WSREP_ASYNC_MONITOR_H | ||
|
||
#ifdef WITH_WSREP | ||
#include <iostream> | ||
#include <condition_variable> | ||
#include <mutex> | ||
|
||
class Wsrep_async_monitor { | ||
public: | ||
using seqno_t = unsigned long long; | ||
|
||
Wsrep_async_monitor() : m_last_entered(0), m_last_left(0) {} | ||
|
||
~Wsrep_async_monitor() { | ||
std::unique_lock<std::mutex> lock(m_mutex); | ||
m_last_entered = 0; | ||
m_last_left = 0; | ||
m_cond.notify_all(); | ||
} | ||
|
||
void enter(seqno_t seqno); | ||
void leave(seqno_t seqno); | ||
void reset(seqno_t seqno); | ||
|
||
seqno_t last_entered() const { return m_last_entered; } | ||
seqno_t last_left() const { return m_last_left; } | ||
|
||
private: | ||
std::mutex m_mutex; | ||
std::condition_variable m_cond; | ||
|
||
// TODO: Evaluate if we really need m_last_entered | ||
seqno_t m_last_entered; | ||
seqno_t m_last_left; | ||
}; | ||
|
||
#endif /* WITH_WSREP */ | ||
#endif /* WSREP_ASYNC_MONITOR_H */ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters