Send keepalive messages in split decoder periodically to avoid wal receiver timeouts during large shard splits. #7229

emelsimsek · 2023-09-26T07:33:16Z

DESCRIPTION: Send keepalive messages during the logical replication phase of large shard splits to avoid timeouts.

During the logical replication part of the shard split process, split decoder filters out the wal records produced by the initial copy. If the number of wal records is big, then split decoder ends up processing for a long time before sending out any wal records through pgoutput. Hence the wal receiver may time out and restarts repeatedly causing our split driver code catch up logic to fail.

Notes:

If the wal_receiver_timeout is set to a very small number e.g. 600ms, it may time out before receiving the keepalives. My tests show that this code works best when thewal_receiver_timeoutis set to 1minute, which is the default value.
Once a logical replication worker time outs, a new one gets launched. The new logical replication worker sets the pg_stat_subscription columns to initial values. E.g. the latest_end_lsn is set to 0. Our driver logic in WaitForGroupedLogicalRepTargetsToCatchUp can not handle LSN value to go back. This is the main reason for it to get stuck in the infinite loop.

codecov · 2023-10-02T08:14:50Z

Codecov Report

Merging #7229 (081fb94) into main (76fdfa3) will decrease coverage by 0.01%.
The diff coverage is 57.14%.

@@            Coverage Diff             @@
##             main    #7229      +/-   ##
==========================================
- Coverage   93.23%   93.22%   -0.01%     
==========================================
  Files         275      275              
  Lines       59486    59494       +8     
==========================================
+ Hits        55461    55463       +2     
- Misses       4025     4031       +6

src/backend/distributed/shardsplit/shardsplit_decoder.c

thanodnl · 2023-10-05T15:35:57Z

src/backend/distributed/shardsplit/shardsplit_decoder.c

+#if (PG_VERSION_NUM >= PG_VERSION_15)
+ OutputPluginUpdateProgress(ctx, skipped_xact);
+#else


Is this still required?
Postgres has refactored this to be called from ReorderBufferProcessTXN in postgres/postgres@8c58624

We don't change that newly introduced callback in postgres 16 right? I was expecting that we would only need to call OutputPluginUpdateProgress for versions before postgres 16. As they didn't backport this change due to the adding of a new callback (ABI changes).

Please correct me if I am wrong.

You are right. We do not need to send keepalive in pg16.

thanodnl · 2023-10-06T14:53:41Z

src/backend/distributed/shardsplit/shardsplit_decoder.c

+static void
+update_replication_progress(LogicalDecodingContext *ctx, bool skipped_xact)
+{
+#if (PG_VERSION_NUM <= PG_VERSION_15)


As discussed via chat, lets ifdef this whole function and the callsite, as to make it very easy to understand in a couple of years what needs to be removed when pg15 support will drop

Reorganized ifdefs. Now I think it is more obvious that the changes are not needed on PG16 and later.

thanodnl · 2023-10-06T14:54:32Z

src/backend/distributed/shardsplit/shardsplit_decoder.c

+#if (PG_VERSION_NUM >= PG_VERSION_15)
+ OutputPluginUpdateProgress(ctx, skipped_xact);
+#else
+ OutputPluginUpdateProgress(ctx);
+#endif


This will never render the pg16 branch anymore given the wider ifdef on the function body:

Suggested change

#if (PG_VERSION_NUM >= PG_VERSION_15)

OutputPluginUpdateProgress(ctx, skipped_xact);

#else

OutputPluginUpdateProgress(ctx);

#endif

OutputPluginUpdateProgress(ctx);

thanodnl · 2023-10-09T11:40:20Z

Needs backporting till 11.1, which is I believe the splitting got introduced.

Thanks for the diligent work on this 🎉

JelteF · 2023-10-09T11:56:53Z

src/backend/distributed/shardsplit/shardsplit_decoder.c

+ */
+ if (ctx->end_xact || ++changes_count >= CHANGES_THRESHOLD)
+ {
+#if (PG_VERSION_NUM == PG_VERSION_15)


This check seems incorrect. It will only match 15.0, not 15.1. To match all PG15 versions we should use this instead:

Suggested change

#if (PG_VERSION_NUM == PG_VERSION_15)

#if (PG_VERSION_NUM >= PG_VERSION_15)

Right, looks like I did not push the final changes from my local. Pls take a look again.

oh, great catch!

Good thing is that it would be a compile error, but if this went wrong in inserting this code we could have had a silent regression.

EDIT: Actually, this was a silent regression with the version check around the function and callsite.

…ceiver timeouts during large shard splits. (#7229) DESCRIPTION: Send keepalive messages during the logical replication phase of large shard splits to avoid timeouts. During the logical replication part of the shard split process, split decoder filters out the wal records produced by the initial copy. If the number of wal records is big, then split decoder ends up processing for a long time before sending out any wal records through pgoutput. Hence the wal receiver may time out and restarts repeatedly causing our split driver code catch up logic to fail. Notes: 1. If the wal_receiver_timeout is set to a very small number e.g. 600ms, it may time out before receiving the keepalives. My tests show that this code works best when the` wal_receiver_timeout `is set to 1minute, which is the default value. 2. Once a logical replication worker time outs, a new one gets launched. The new logical replication worker sets the pg_stat_subscription columns to initial values. E.g. the latest_end_lsn is set to 0. Our driver logic in `WaitForGroupedLogicalRepTargetsToCatchUp` can not handle LSN value to go back. This is the main reason for it to get stuck in the infinite loop.

emelsimsek changed the title ~~Fix shard split stall send keep alive~~ Send keepalive messages in split decoder periodically to avoid wal receiver timeouts during large shard splits. Sep 26, 2023

emelsimsek linked an issue Sep 26, 2023 that may be closed by this pull request

isolate_tenant_to_new_shard, in logical mode, gets stuck in a loop and fails eventually when the total amount of data in the source shards is large. #7220

Closed

emelsimsek force-pushed the FixShardSplitStall-SendKeepAlive branch from 4af9b4d to f36ea7b Compare October 2, 2023 08:02

emelsimsek marked this pull request as ready for review October 2, 2023 08:17

emelsimsek requested review from rajeshkt78, thanodnl and JelteF October 2, 2023 08:35

rajeshkt78 reviewed Oct 3, 2023

View reviewed changes

src/backend/distributed/shardsplit/shardsplit_decoder.c Show resolved Hide resolved

rajeshkt78 reviewed Oct 4, 2023

View reviewed changes

src/backend/distributed/shardsplit/shardsplit_decoder.c Show resolved Hide resolved

rajeshkt78 reviewed Oct 4, 2023

View reviewed changes

src/backend/distributed/shardsplit/shardsplit_decoder.c Show resolved Hide resolved

thanodnl reviewed Oct 5, 2023

View reviewed changes

emelsimsek force-pushed the FixShardSplitStall-SendKeepAlive branch from f36ea7b to fa6fe46 Compare October 6, 2023 07:08

thanodnl requested changes Oct 6, 2023

View reviewed changes

emelsimsek requested a review from thanodnl October 9, 2023 11:20

thanodnl approved these changes Oct 9, 2023

View reviewed changes

thanodnl added the backport label Oct 9, 2023

JelteF reviewed Oct 9, 2023

View reviewed changes

emelsimsek requested a review from JelteF October 9, 2023 12:22

JelteF approved these changes Oct 9, 2023

View reviewed changes

emelsimsek enabled auto-merge (squash) October 9, 2023 19:00

emelsimsek disabled auto-merge October 9, 2023 19:00

emelsimsek added 4 commits October 9, 2023 22:09

Add keepalive messages

b45a5ff

Exclude PG 16

f4c4e58

Reorder ifdefs

c69b45f

Reorder ifdefs

081fb94

emelsimsek force-pushed the FixShardSplitStall-SendKeepAlive branch from e8af96a to 081fb94 Compare October 9, 2023 19:10

emelsimsek merged commit e9035f6 into main Oct 9, 2023
108 of 109 checks passed

emelsimsek deleted the FixShardSplitStall-SendKeepAlive branch October 9, 2023 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send keepalive messages in split decoder periodically to avoid wal receiver timeouts during large shard splits. #7229

Send keepalive messages in split decoder periodically to avoid wal receiver timeouts during large shard splits. #7229

emelsimsek commented Sep 26, 2023 •

edited

Loading

codecov bot commented Oct 2, 2023 •

edited

Loading

thanodnl Oct 5, 2023

emelsimsek Oct 6, 2023

thanodnl Oct 6, 2023

emelsimsek Oct 9, 2023 •

edited

Loading

thanodnl Oct 6, 2023

thanodnl commented Oct 9, 2023

JelteF Oct 9, 2023 •

edited

Loading

emelsimsek Oct 9, 2023

thanodnl Oct 9, 2023 •

edited

Loading

	#if (PG_VERSION_NUM == PG_VERSION_15)
	#if (PG_VERSION_NUM >= PG_VERSION_15)

Send keepalive messages in split decoder periodically to avoid wal receiver timeouts during large shard splits. #7229

Send keepalive messages in split decoder periodically to avoid wal receiver timeouts during large shard splits. #7229

Conversation

emelsimsek commented Sep 26, 2023 • edited Loading

codecov bot commented Oct 2, 2023 • edited Loading

Codecov Report

thanodnl Oct 5, 2023

Choose a reason for hiding this comment

emelsimsek Oct 6, 2023

Choose a reason for hiding this comment

thanodnl Oct 6, 2023

Choose a reason for hiding this comment

emelsimsek Oct 9, 2023 • edited Loading

Choose a reason for hiding this comment

thanodnl Oct 6, 2023

Choose a reason for hiding this comment

thanodnl commented Oct 9, 2023

JelteF Oct 9, 2023 • edited Loading

Choose a reason for hiding this comment

emelsimsek Oct 9, 2023

Choose a reason for hiding this comment

thanodnl Oct 9, 2023 • edited Loading

Choose a reason for hiding this comment

emelsimsek commented Sep 26, 2023 •

edited

Loading

codecov bot commented Oct 2, 2023 •

edited

Loading

emelsimsek Oct 9, 2023 •

edited

Loading

JelteF Oct 9, 2023 •

edited

Loading

thanodnl Oct 9, 2023 •

edited

Loading