Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send keepalive messages in split decoder periodically to avoid wal receiver timeouts during large shard splits. #7229

Merged
merged 4 commits into from
Oct 9, 2023

Conversation

emelsimsek
Copy link
Contributor

@emelsimsek emelsimsek commented Sep 26, 2023

DESCRIPTION: Send keepalive messages during the logical replication phase of large shard splits to avoid timeouts.

During the logical replication part of the shard split process, split decoder filters out the wal records produced by the initial copy. If the number of wal records is big, then split decoder ends up processing for a long time before sending out any wal records through pgoutput. Hence the wal receiver may time out and restarts repeatedly causing our split driver code catch up logic to fail.

Notes:

  1. If the wal_receiver_timeout is set to a very small number e.g. 600ms, it may time out before receiving the keepalives. My tests show that this code works best when thewal_receiver_timeoutis set to 1minute, which is the default value.

  2. Once a logical replication worker time outs, a new one gets launched. The new logical replication worker sets the pg_stat_subscription columns to initial values. E.g. the latest_end_lsn is set to 0. Our driver logic in WaitForGroupedLogicalRepTargetsToCatchUp can not handle LSN value to go back. This is the main reason for it to get stuck in the infinite loop.

@emelsimsek emelsimsek changed the title Fix shard split stall send keep alive Send keepalive messages in split decoder periodically to avoid wal receiver timeouts during large shard splits. Sep 26, 2023
@emelsimsek emelsimsek force-pushed the FixShardSplitStall-SendKeepAlive branch from 4af9b4d to f36ea7b Compare October 2, 2023 08:02
@codecov
Copy link

codecov bot commented Oct 2, 2023

Codecov Report

Merging #7229 (081fb94) into main (76fdfa3) will decrease coverage by 0.01%.
The diff coverage is 57.14%.

@@            Coverage Diff             @@
##             main    #7229      +/-   ##
==========================================
- Coverage   93.23%   93.22%   -0.01%     
==========================================
  Files         275      275              
  Lines       59486    59494       +8     
==========================================
+ Hits        55461    55463       +2     
- Misses       4025     4031       +6     

@emelsimsek emelsimsek marked this pull request as ready for review October 2, 2023 08:17
Comment on lines +117 to +123
#if (PG_VERSION_NUM >= PG_VERSION_15)
OutputPluginUpdateProgress(ctx, skipped_xact);
#else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still required?
Postgres has refactored this to be called from ReorderBufferProcessTXN in postgres/postgres@8c58624

We don't change that newly introduced callback in postgres 16 right? I was expecting that we would only need to call OutputPluginUpdateProgress for versions before postgres 16. As they didn't backport this change due to the adding of a new callback (ABI changes).

Please correct me if I am wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. We do not need to send keepalive in pg16.

@emelsimsek emelsimsek force-pushed the FixShardSplitStall-SendKeepAlive branch from f36ea7b to fa6fe46 Compare October 6, 2023 07:08
static void
update_replication_progress(LogicalDecodingContext *ctx, bool skipped_xact)
{
#if (PG_VERSION_NUM <= PG_VERSION_15)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed via chat, lets ifdef this whole function and the callsite, as to make it very easy to understand in a couple of years what needs to be removed when pg15 support will drop

Copy link
Contributor Author

@emelsimsek emelsimsek Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganized ifdefs. Now I think it is more obvious that the changes are not needed on PG16 and later.

Comment on lines +121 to +125
#if (PG_VERSION_NUM >= PG_VERSION_15)
OutputPluginUpdateProgress(ctx, skipped_xact);
#else
OutputPluginUpdateProgress(ctx);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will never render the pg16 branch anymore given the wider ifdef on the function body:

Suggested change
#if (PG_VERSION_NUM >= PG_VERSION_15)
OutputPluginUpdateProgress(ctx, skipped_xact);
#else
OutputPluginUpdateProgress(ctx);
#endif
OutputPluginUpdateProgress(ctx);

@thanodnl
Copy link
Member

thanodnl commented Oct 9, 2023

Needs backporting till 11.1, which is I believe the splitting got introduced.

Thanks for the diligent work on this 🎉

*/
if (ctx->end_xact || ++changes_count >= CHANGES_THRESHOLD)
{
#if (PG_VERSION_NUM == PG_VERSION_15)
Copy link
Contributor

@JelteF JelteF Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check seems incorrect. It will only match 15.0, not 15.1. To match all PG15 versions we should use this instead:

Suggested change
#if (PG_VERSION_NUM == PG_VERSION_15)
#if (PG_VERSION_NUM >= PG_VERSION_15)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, looks like I did not push the final changes from my local. Pls take a look again.

Copy link
Member

@thanodnl thanodnl Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, great catch!

Good thing is that it would be a compile error, but if this went wrong in inserting this code we could have had a silent regression.

EDIT: Actually, this was a silent regression with the version check around the function and callsite.

@emelsimsek emelsimsek requested a review from JelteF October 9, 2023 12:22
@emelsimsek emelsimsek enabled auto-merge (squash) October 9, 2023 19:00
@emelsimsek emelsimsek force-pushed the FixShardSplitStall-SendKeepAlive branch from e8af96a to 081fb94 Compare October 9, 2023 19:10
@emelsimsek emelsimsek merged commit e9035f6 into main Oct 9, 2023
108 of 109 checks passed
@emelsimsek emelsimsek deleted the FixShardSplitStall-SendKeepAlive branch October 9, 2023 19:33
francisjodi pushed a commit that referenced this pull request Nov 13, 2023
…ceiver timeouts during large shard splits. (#7229)

DESCRIPTION: Send keepalive messages during the logical replication
phase of large shard splits to avoid timeouts.

During the logical replication part of the shard split process, split
decoder filters out the wal records produced by the initial copy. If the
number of wal records is big, then split decoder ends up processing for
a long time before sending out any wal records through pgoutput. Hence
the wal receiver may time out and restarts repeatedly causing our split
driver code catch up logic to fail.

Notes: 

1. If the wal_receiver_timeout is set to a very small number e.g. 600ms,
it may time out before receiving the keepalives. My tests show that this
code works best when the` wal_receiver_timeout `is set to 1minute, which
is the default value.

2. Once a logical replication worker time outs, a new one gets launched.
The new logical replication worker sets the pg_stat_subscription columns
to initial values. E.g. the latest_end_lsn is set to 0. Our driver logic
in `WaitForGroupedLogicalRepTargetsToCatchUp` can not handle LSN value
to go back. This is the main reason for it to get stuck in the infinite
loop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants