-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PXC-4173: PXC node stalls with parallel replication workers executing DDLs via async node #1927
base: 8.0
Are you sure you want to change the base?
PXC-4173: PXC node stalls with parallel replication workers executing DDLs via async node #1927
Conversation
45b94f6
to
91ab128
Compare
63a4df8
to
25c09e8
Compare
830bcf5
to
1a9149c
Compare
1a9149c
to
a1f0c52
Compare
30112dc
to
9387f0a
Compare
9387f0a
to
eafc945
Compare
eafc945
to
0972579
Compare
… DDLs via async node https://perconadev.atlassian.net/browse/PXC-4173 Problem ======= PXC replica could enter into a deadlock with multi threaded replication when --replica-preserve-commit-order is enabled. The deadlock is more likely to occur on a replica with --replica-preserve-commit-order enabled when the source server's workload consists of concurrent executions of DMLs and DDLs. In such a scenario, the replica shall try to commit them in the same order as that of the source which will conflict with galera's ordering resulting in a deadlock. MySQL ordering: Commits are serialized as per their commit order on source server. Galera ordering: Commits are serialized in a FIFO manner. Scenario ======== Let's assume we have the following two transactions in the relay log with same last_committed. T1: DML, seqno=10 T2: DDL, seqno=11 +-----------+--------------+-------------------------------------------------------+ | Timestamp | Thread | Activity | +-----------+--------------+-------------------------------------------------------+ | t0 | Co-ordinator | Assigns T1 to Applier-1 | +-----------+--------------+-------------------------------------------------------+ | t1 | Co-ordinator | Assigns T2 to Applier-2 | +-----------+--------------+-------------------------------------------------------+ | t2 | Applier-1 | DML starts with seqno = 10 | +-----------+--------------+-------------------------------------------------------+ | t3 | Applier-2 | DDL starts with seqno = 11 | +-----------+--------------+-------------------------------------------------------+ | t4 | Applier-2 | DDL calls TOI_begin(), acquires commit monitor | +-----------+--------------+-------------------------------------------------------+ | t5 | Applier-2 | DDL executes | +-----------+--------------+-------------------------------------------------------+ | t5 | Applier-1 | DML executes | +-----------+--------------+-------------------------------------------------------+ | t6 | Applier-2 | DDL reaches commit_order_manager, finds that | | | | it needs to wait until Applier-1 is committed | +-----------+--------------+-------------------------------------------------------+ | t7 | Applier-1 | DML reaches commit, calls commit_order_enter_local() | | | | and will wait for Applier-2 to release commit monitor | +-----------+--------------+-------------------------------------------------------+ In the end it will result in Applier-1: DML executing Xid_log_event::commit waiting for DDL to release commit order Applier-2: DDL executing ALTER TABLE waiting for applier-1 to commit first Solution ======== This commit introduces "Async Monitor" at server layer to ensure that the ordering in galera is done in the same order in the relay log. Async Monitor is similar to Commit Monitor used in galera, but differs in how seqnois are tracked within the monitor. The Async Monitor has the following methods. 1. schedule() - Schedules the transactions in as per their order in the the relay log. 2. enter() - Ensures that threads are serialized as per their order. 3. leave() - Ensures that threads are removed from the monitor in the serialized order. 4. skip() - Provides some flexibilty to the monitor to skip transactions. This will be required for handling skipped and filtered transactions. Lifetime of the Async Monitor: Async Monitor is created on START REPLICA and is destroyed on STOP REPLICA, or whenever applier thread exits.
0972579
to
fe1bd96
Compare
if (wsrep_async_monitor) { | ||
auto seqno = gtid_log_ev->sequence_number; | ||
wsrep_async_monitor->schedule(seqno); | ||
fprintf(stderr, "Scheduled the seqno: %llu in async monitor\n", seqno); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fprintf
multiple occurences
@@ -37,6 +37,9 @@ | |||
#include "sql/mysqld.h" // connection_events_loop_aborted | |||
#include "sql/rpl_gtid.h" | |||
#include "sql/rpl_rli.h" // Relay_log_info | |||
#ifdef WITH_WSREP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep WSREP-dependant includes after the standard ones, like we do in other cases?
@@ -63,6 +63,9 @@ | |||
#include "sql/sql_class.h" // THD | |||
#include "sql/system_variables.h" | |||
#include "sql/table.h" | |||
#ifdef WITH_WSREP | |||
#include "sql/wsrep_async_monitor.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like forward type declaration is enough, and the header can be included in *.cc file
std::unique_lock<std::mutex> lock(m_mutex); | ||
|
||
// Check if the sequence number matches the front of the queue. | ||
// In a correctly functioning monitor this should not happen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a correctly functioning monitor this should not happen ->
In a correctly functioning monitor this should always be true
} | ||
|
||
// Mark the seqno as skipped | ||
skipped_seqnos.insert(seqno); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see any place where seqnos are removed from skipped_seqnos
if (opt_replica_preserve_commit_order && opt_mts_replica_parallel_workers > 1) { | ||
WSREP_ERROR("Async replica thread %u is not monitored by wsrep async monitor", | ||
thd->thread_id()); | ||
exit(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exit?
maybe unireg_abort(), or assert and handle it somehow?
if (opt_replica_preserve_commit_order && opt_mts_replica_parallel_workers > 1) { | ||
WSREP_ERROR("Async replica thread %u is not monitored by wsrep async monitor", | ||
thd->thread_id()); | ||
exit(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exit?
Wsrep_async_monitor *wsrep_async_monitor {rli->get_wsrep_async_monitor()}; | ||
if (wsrep_async_monitor) { | ||
auto seqno = gtid_log_ev->sequence_number; | ||
wsrep_async_monitor->schedule(seqno); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is my understandig correct, that in this point transactions appear in the order they were committed on the source node?
We create the queue of ordered transactions, and will use AsyncMonitor to enforce this order of commit in Galera, right?
// Method for main thread to add scheduled seqnos | ||
void Wsrep_async_monitor::schedule(seqno_t seqno) { | ||
std::unique_lock<std::mutex> lock(m_mutex); | ||
scheduled_seqnos.push(seqno); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seqons should be growing monolithically, right?
Can we add an assert like seqno = prev+1?
If not monolithically growing (why?), at least let's add an assert seqno > prev.
A good code comment would be appreciated
|
||
// Wait until this transaction is at the head of the scheduled queue | ||
m_cond.wait(lock, [this, seqno] { | ||
// Remove skipped transactions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this can happen? Could you add a code comment?
I suspect this is the case when A, B, C have been scheduled, then B skipped, but skip() removes only from the front of scheduled_seqno queue
The commit message says about ordering transactions basing on their seqnos i not way this = prev + 1. But the code does it in a different way (I can understand that logically it is still the same). It would be good to add more details about how the current implementation actually works. |
https://perconadev.atlassian.net/browse/PXC-4173