Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VReplication: Estimate lag when workflow fully throttled #16577

Merged
merged 12 commits into from
Aug 15, 2024

Conversation

mattlord
Copy link
Contributor

@mattlord mattlord commented Aug 10, 2024

Description

We did not properly calculate the vreplication lag for a workflow when it was fully throttled — on the source/vstreamer side or the target/vplayer side.

This PR ensures that we account for these scenarios and estimate the vreplication lag for the workflow when they occur.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

@github-actions github-actions bot added this to the v21.0.0 milestone Aug 10, 2024
@mattlord mattlord added the Component: Observability Pull requests that touch tracing/metrics/monitoring label Aug 10, 2024
Copy link

codecov bot commented Aug 10, 2024

Codecov Report

Attention: Patch coverage is 86.95652% with 3 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@f2d5d1c). Learn more about missing BASE report.
Report is 6 commits behind head on main.

Files Patch % Lines
go/vt/vttablet/tabletserver/vstreamer/vstreamer.go 71.42% 2 Missing ⚠️
.../vt/vttablet/tabletmanager/vreplication/vplayer.go 93.75% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #16577   +/-   ##
=======================================
  Coverage        ?   68.79%           
=======================================
  Files           ?     1557           
  Lines           ?   199895           
  Branches        ?        0           
=======================================
  Hits            ?   137515           
  Misses          ?    62380           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this PR is still in Draft, it looks good to me. Thank you!

Signed-off-by: Matt Lord <mattalord@gmail.com>
This did not have a practical impact but was obviously
wrong

Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord requested a review from deepthi August 12, 2024 17:38
@mattlord mattlord changed the title VReplication: Estimate lag when vplayer throttled VReplication: Estimate lag when workflow throttled Aug 13, 2024
@mattlord mattlord changed the title VReplication: Estimate lag when workflow throttled VReplication: Estimate lag when workflow fully throttled Aug 13, 2024
@mattlord mattlord removed the Backport to: release-20.0 Needs to be backport to release-20.0 label Aug 13, 2024
Revert "Fix vitessio#16565"

Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Comment on lines -321 to -323
throttledHeartbeatsRateLimiter.Do(func() error {
return injectHeartbeat(true, checkResult.Summary())
})
Copy link
Contributor Author

@mattlord mattlord Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that we were getting double the heartbeats as we injected them here and via the timer. I consolidated them into the main loop with the timer. Otherwise we needed to do twice the work as the other heartbeats would get interspersed with these and those would indicate that we weren't throttled when in fact we were — so we needed to check the throttler there as well and reset the heartbeat timer in injectHeartbeat() to limit the extra heartbeats and even then some extras still went through as it was a race.

Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand some of the changes in the last force-push, asking for clarifications inline.

go/vt/vttablet/tabletmanager/vreplication/vplayer.go Outdated Show resolved Hide resolved
})
// we won't process events, until we're no longer throttling
logger.Infof("throttled.")
logger.Infof("vstreamer throttled%s: %s.", wfNameLog, checkResult.Summary())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this spam the logs when throttled?

Copy link
Contributor Author

@mattlord mattlord Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a throttled logger that logs the message at most once every 5 minutes. It was added in #14936. I thought about removing these altogether as the value was not clear to me, but instead I at least made this one much more useful IMO as it would state the component and the workflow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK cool!

@@ -394,7 +391,8 @@ func (vs *vstreamer) parseEvents(ctx context.Context, events <-chan mysql.Binlog
case <-ctx.Done():
return nil
case <-hbTimer.C:
if err := injectHeartbeat(false, ""); err != nil {
checkResult, ok := vs.vse.throttlerClient.ThrottleCheckOKOrWaitAppName(ctx, vs.throttlerApp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain what this throttler check is for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added as part of consolidating the two heartbeat injectors: #16577 (comment)

We had two goroutines doing heartbeat injections. One that was sent only when we were throttled (removed here), and this one that would be sent both when we were throttled AND when we were not. This one would always say that we weren't throttled — even when in fact we were — so it prevented us from being able to estimate the lag in the vplayer when the vstreamer was fully throttled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I encourage you to use vs.vse.throttlerClient.ThrottleCheckOK as opposed to vs.vse.throttlerClient.ThrottleCheckOKOrWaitAppName. You're not interested in the extra sleep.

This makes sense now, thank you for clarifying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Thank you! 🙇 That makes total sense — no need for the sleep here as it's not done in a tight loop. I should have at least checked the function comments. 🙂 I've made this change.

Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch and fix.

@mattlord mattlord merged commit 95c77e9 into vitessio:main Aug 15, 2024
129 checks passed
@mattlord mattlord deleted the vplayer_throttled_lag branch August 15, 2024 12:50
venkatraju pushed a commit to slackhq/vitess that referenced this pull request Aug 29, 2024
)

Signed-off-by: Matt Lord <mattalord@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Observability Pull requests that touch tracing/metrics/monitoring Component: VReplication Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug Report: VReplication lag is not updated when vplayer is throttled
4 participants