-
Notifications
You must be signed in to change notification settings - Fork 149
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PXC-4173: PXC node stalls with parallel replication workers executing…
… DDLs via async node https://perconadev.atlassian.net/browse/PXC-4173 Problem ======= PXC replica could enter into a deadlock with multi threaded replication when --replica-preserve-commit-order is enabled. The deadlock is more likely to occur on a replica with --replica-preserve-commit-order enabled when the source server's workload consists of concurrent executions of DMLs and DDLs. In such a scenario, the replica shall try to commit them in the same order as that of the source which will conflict with galera's ordering resulting in a deadlock. MySQL ordering: Commits are serialized as per their commit order on source server. Galera ordering: Commits are serialized in a FIFO manner. Scenario ======== Let's assume we have the following two transactions in the relay log with same last_committed. T1: DML, seqno=10 T2: DDL, seqno=11 +-----------+--------------+-------------------------------------------------------+ | Timestamp | Thread | Activity | +-----------+--------------+-------------------------------------------------------+ | t0 | Co-ordinator | Assigns T1 to Applier-1 | +-----------+--------------+-------------------------------------------------------+ | t1 | Co-ordinator | Assigns T2 to Applier-2 | +-----------+--------------+-------------------------------------------------------+ | t2 | Applier-1 | DML starts with seqno = 10 | +-----------+--------------+-------------------------------------------------------+ | t3 | Applier-2 | DDL starts with seqno = 11 | +-----------+--------------+-------------------------------------------------------+ | t4 | Applier-2 | DDL calls TOI_begin(), acquires commit monitor | +-----------+--------------+-------------------------------------------------------+ | t5 | Applier-2 | DDL executes | +-----------+--------------+-------------------------------------------------------+ | t5 | Applier-1 | DML executes | +-----------+--------------+-------------------------------------------------------+ | t6 | Applier-2 | DDL reaches commit_order_manager, finds that | | | | it needs to wait until Applier-1 is committed | +-----------+--------------+-------------------------------------------------------+ | t7 | Applier-1 | DML reaches commit, calls commit_order_enter_local() | | | | and will wait for Applier-2 to release commit monitor | +-----------+--------------+-------------------------------------------------------+ In the end it will result in Applier-1: DML executing Xid_log_event::commit waiting for DDL to release commit order Applier-2: DDL executing ALTER TABLE waiting for applier-1 to commit first Solution ======== This commit introduces "Async Monitor" at server layer to ensure that the ordering in galera is done in the same order in the relay log. Async Monitor is similar to Commit Monitor used in galera which internally keeps track of the last_left seqno. If a transaction with seqno > last_left, then such transactions shall wait until the condition `seqno = last_left + 1` is met, meaning it is its turn to enter the Commit Monitor in galera. This ensures that transactions are registered in galera is the same sequence as on the source. Lifetime of the Async Monitor: Async Monitor is created on START REPLICA and is destroyed on STOP REPLICA, and whenever applier thread exits. Before it is destroyed, the value of the last_left is stored in the Relay_log_info object and the value is restored on subsequent START SLAVE.
- Loading branch information
1 parent
9a4609a
commit eafc945
Showing
15 changed files
with
514 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
/* Copyright (c) 2021 Percona LLC and/or its affiliates. All rights reserved. | ||
|
||
This program is free software; you can redistribute it and/or | ||
modify it under the terms of the GNU General Public License | ||
as published by the Free Software Foundation; version 2 of | ||
the License. | ||
|
||
This program is distributed in the hope that it will be useful, | ||
but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
GNU General Public License for more details. | ||
|
||
You should have received a copy of the GNU General Public License | ||
along with this program; if not, write to the Free Software | ||
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ | ||
|
||
#ifdef WITH_WSREP | ||
#include "sql/wsrep_async_monitor.h" | ||
#include <algorithm> | ||
#include <cassert> | ||
|
||
// Method for main thread to add scheduled seqnos | ||
void Wsrep_async_monitor::schedule(seqno_t seqno) { | ||
std::unique_lock<std::mutex> lock(m_mutex); | ||
scheduled_seqnos.push(seqno); | ||
} | ||
|
||
// Method for both DDL and DML to enter the monitor | ||
void Wsrep_async_monitor::enter(seqno_t seqno) { | ||
std::unique_lock<std::mutex> lock(m_mutex); | ||
|
||
// Wait until this transaction is at the head of the scheduled queue | ||
m_cond.wait(lock, [this, seqno] { | ||
// Remove skipped transactions | ||
while (!scheduled_seqnos.empty() && | ||
skipped_seqnos.count(scheduled_seqnos.front()) > 0) { | ||
scheduled_seqnos.pop(); | ||
} | ||
return !scheduled_seqnos.empty() && scheduled_seqnos.front() == seqno; | ||
}); | ||
} | ||
|
||
// Method to be called after DDL/DML processing is complete | ||
void Wsrep_async_monitor::leave(seqno_t seqno) { | ||
std::unique_lock<std::mutex> lock(m_mutex); | ||
|
||
// Check if the sequence number matches the front of the queue. | ||
// In a correctly functioning monitor this should not happen | ||
// as each transaction should exit in the order it was scheduled | ||
// and processed. | ||
if (!scheduled_seqnos.empty() && scheduled_seqnos.front() == seqno) { | ||
// Remove the seqno from the scheduled queue now that it has completed | ||
scheduled_seqnos.pop(); | ||
} else { | ||
// std::cout << "Error: Mismatch in sequence numbers. Expected " | ||
// << (scheduled_seqnos.empty() | ||
// ? "none" | ||
// : std::to_string(scheduled_seqnos.front())) | ||
// << " but got " << seqno << "." << std::endl; | ||
assert(false && "Sequence number mismatch in leave()"); | ||
exit(1); | ||
} | ||
|
||
// Notify waiting threads in case the next scheduled sequence can enter | ||
m_cond.notify_all(); | ||
} | ||
|
||
// Method to skip a transaction that will not call enter() and exit() | ||
void Wsrep_async_monitor::skip(unsigned long seqno) { | ||
std::unique_lock<std::mutex> lock(m_mutex); | ||
|
||
// Check if the seqno is already marked as skipped | ||
if (skipped_seqnos.count(seqno) > 0) { | ||
return; // Already skipped, so do nothing | ||
} | ||
|
||
// Mark the seqno as skipped | ||
skipped_seqnos.insert(seqno); | ||
|
||
// Remove it from the scheduled queue if it is at the front | ||
if (!scheduled_seqnos.empty() && scheduled_seqnos.front() == seqno) { | ||
scheduled_seqnos.pop(); | ||
} | ||
|
||
// Notify in case other transactions are waiting to enter | ||
m_cond.notify_all(); | ||
} | ||
|
||
// Method to return if the monitor is empty, used by the unittests | ||
bool Wsrep_async_monitor::is_empty() { | ||
std::unique_lock<std::mutex> lock(m_mutex); | ||
return scheduled_seqnos.empty(); | ||
} | ||
#endif /* WITH_WSREP */ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
/* Copyright (c) 2024 Percona LLC and/or its affiliates. All rights reserved. | ||
|
||
This program is free software; you can redistribute it and/or | ||
modify it under the terms of the GNU General Public License | ||
as published by the Free Software Foundation; version 2 of | ||
the License. | ||
|
||
This program is distributed in the hope that it will be useful, | ||
but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
GNU General Public License for more details. | ||
|
||
You should have received a copy of the GNU General Public License | ||
along with this program; if not, write to the Free Software | ||
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ | ||
|
||
#ifndef WSREP_ASYNC_MONITOR_H | ||
#define WSREP_ASYNC_MONITOR_H | ||
|
||
#ifdef WITH_WSREP | ||
#include <condition_variable> | ||
#include <iostream> | ||
#include <mutex> | ||
#include <queue> | ||
#include <set> | ||
|
||
class Wsrep_async_monitor { | ||
public: | ||
using seqno_t = unsigned long long; | ||
|
||
// Method for main thread to add scheduled seqnos | ||
void schedule(seqno_t seqno); | ||
|
||
// Method for both DDL and DML to enter the monitor | ||
void enter(seqno_t seqno); | ||
|
||
// Method to be called after DDL/DML processing is complete | ||
void leave(seqno_t seqno); | ||
|
||
// Method to skip a transaction that will not call enter() and exit() | ||
void skip(unsigned long seqno); | ||
|
||
// Method to return if the monitor is empty, used by the unittests | ||
bool is_empty(); | ||
|
||
private: | ||
std::mutex m_mutex; | ||
std::condition_variable m_cond; | ||
std::set<seqno_t> skipped_seqnos; // Tracks skipped sequence numbers | ||
std::queue<seqno_t> scheduled_seqnos; // Queue to track scheduled seqnos | ||
}; | ||
|
||
#endif /* WITH_WSREP */ | ||
#endif /* WSREP_ASYNC_MONITOR_H */ |
Oops, something went wrong.