-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send keepalive messages in split decoder periodically to avoid wal receiver timeouts during large shard splits. #7229
Conversation
4af9b4d
to
f36ea7b
Compare
Codecov Report
@@ Coverage Diff @@
## main #7229 +/- ##
==========================================
- Coverage 93.23% 93.22% -0.01%
==========================================
Files 275 275
Lines 59486 59494 +8
==========================================
+ Hits 55461 55463 +2
- Misses 4025 4031 +6 |
#if (PG_VERSION_NUM >= PG_VERSION_15) | ||
OutputPluginUpdateProgress(ctx, skipped_xact); | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still required?
Postgres has refactored this to be called from ReorderBufferProcessTXN
in postgres/postgres@8c58624
We don't change that newly introduced callback in postgres 16 right? I was expecting that we would only need to call OutputPluginUpdateProgress
for versions before postgres 16. As they didn't backport this change due to the adding of a new callback (ABI changes).
Please correct me if I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. We do not need to send keepalive in pg16.
f36ea7b
to
fa6fe46
Compare
static void | ||
update_replication_progress(LogicalDecodingContext *ctx, bool skipped_xact) | ||
{ | ||
#if (PG_VERSION_NUM <= PG_VERSION_15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed via chat, lets ifdef this whole function and the callsite, as to make it very easy to understand in a couple of years what needs to be removed when pg15 support will drop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reorganized ifdefs. Now I think it is more obvious that the changes are not needed on PG16 and later.
#if (PG_VERSION_NUM >= PG_VERSION_15) | ||
OutputPluginUpdateProgress(ctx, skipped_xact); | ||
#else | ||
OutputPluginUpdateProgress(ctx); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will never render the pg16 branch anymore given the wider ifdef on the function body:
#if (PG_VERSION_NUM >= PG_VERSION_15) | |
OutputPluginUpdateProgress(ctx, skipped_xact); | |
#else | |
OutputPluginUpdateProgress(ctx); | |
#endif | |
OutputPluginUpdateProgress(ctx); |
Needs backporting till 11.1, which is I believe the splitting got introduced. Thanks for the diligent work on this 🎉 |
*/ | ||
if (ctx->end_xact || ++changes_count >= CHANGES_THRESHOLD) | ||
{ | ||
#if (PG_VERSION_NUM == PG_VERSION_15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check seems incorrect. It will only match 15.0, not 15.1. To match all PG15 versions we should use this instead:
#if (PG_VERSION_NUM == PG_VERSION_15) | |
#if (PG_VERSION_NUM >= PG_VERSION_15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, looks like I did not push the final changes from my local. Pls take a look again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, great catch!
Good thing is that it would be a compile error, but if this went wrong in inserting this code we could have had a silent regression.
EDIT: Actually, this was a silent regression with the version check around the function and callsite.
e8af96a
to
081fb94
Compare
…ceiver timeouts during large shard splits. (#7229) DESCRIPTION: Send keepalive messages during the logical replication phase of large shard splits to avoid timeouts. During the logical replication part of the shard split process, split decoder filters out the wal records produced by the initial copy. If the number of wal records is big, then split decoder ends up processing for a long time before sending out any wal records through pgoutput. Hence the wal receiver may time out and restarts repeatedly causing our split driver code catch up logic to fail. Notes: 1. If the wal_receiver_timeout is set to a very small number e.g. 600ms, it may time out before receiving the keepalives. My tests show that this code works best when the` wal_receiver_timeout `is set to 1minute, which is the default value. 2. Once a logical replication worker time outs, a new one gets launched. The new logical replication worker sets the pg_stat_subscription columns to initial values. E.g. the latest_end_lsn is set to 0. Our driver logic in `WaitForGroupedLogicalRepTargetsToCatchUp` can not handle LSN value to go back. This is the main reason for it to get stuck in the infinite loop.
DESCRIPTION: Send keepalive messages during the logical replication phase of large shard splits to avoid timeouts.
During the logical replication part of the shard split process, split decoder filters out the wal records produced by the initial copy. If the number of wal records is big, then split decoder ends up processing for a long time before sending out any wal records through pgoutput. Hence the wal receiver may time out and restarts repeatedly causing our split driver code catch up logic to fail.
Notes:
If the wal_receiver_timeout is set to a very small number e.g. 600ms, it may time out before receiving the keepalives. My tests show that this code works best when the
wal_receiver_timeout
is set to 1minute, which is the default value.Once a logical replication worker time outs, a new one gets launched. The new logical replication worker sets the pg_stat_subscription columns to initial values. E.g. the latest_end_lsn is set to 0. Our driver logic in
WaitForGroupedLogicalRepTargetsToCatchUp
can not handle LSN value to go back. This is the main reason for it to get stuck in the infinite loop.