Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: ReceiveJournalEvent Flag for VStream #16644

Open
twthorn opened this issue Aug 23, 2024 · 0 comments · May be fixed by #16737
Open

Feature Request: ReceiveJournalEvent Flag for VStream #16644

twthorn opened this issue Aug 23, 2024 · 0 comments · May be fixed by #16737

Comments

@twthorn
Copy link
Contributor

twthorn commented Aug 23, 2024

Feature Description

The only way to receive journal events is with StopOnReshard flag. As documented in #16621 the behavior is non-deterministic and depends on a race condition. From the investigation of this issue it's a limitation of grpc and not something we could fix for StopOnReshard to behave consisently. Since there is no way to consistently receive a Journal event with the StopOnReshard flag, we request a flag ReceiveJournalEvent that will send Journal events and continue streaming on new shards after a reshard. Then the client can decide if it wants to terminate the stream or not after it receives the Journal Event.

Use Case(s)

Debezium vitess connector would use this for two scenarios:

  1. Rebalance Vitess shards across tasks - without receiving journal events, tasks with shard subsets transparently receive more shards after a split. This can lead to one task being overloaded. By being able to receive a journal event, we can stop the connector and then redistribute the shards. Just receiving a VGTID with new shards is not sufficient since it's local to the task's shards, and doesn't represent the global reshard operation. Thus we may restart and not correctly distribute shards.
  2. Shard Epoch inheritance - Debezium has the ability to encode additional order metadata into each change record to allow for correct primary key ordering to be determined independent of the order the records are produced to a downstream system (e.g., kafka, so it can handle partition change transparently, no need to restream a copy of the table to the new partition scheme). In order to properly calculate this order metadata, we need to know the complete reshard that happened (all affected shards). This allows us to update metadata stored for each shard. Receiving VGTID with new shards is not sufficient since it's only local to each client's shard set and may not be complete (we can't guarantee all vstream clients of Debezium receive all latest VGTIDs, the only way to create a global picture of the reshard). We need to receive a journal event which describes the reshard globally (not just local to each client's shard set/VGTID).
@twthorn twthorn added Needs Triage This issue needs to be correctly labelled and triaged Type: Feature Request labels Aug 23, 2024
@GuptaManan100 GuptaManan100 added Component: VReplication and removed Needs Triage This issue needs to be correctly labelled and triaged labels Aug 26, 2024
@mattlord mattlord self-assigned this Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Prioritized
Development

Successfully merging a pull request may close this issue.

3 participants