-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(crons): Fan out check_missing
task to each monitor_environment
#55924
Conversation
Codecov Report
@@ Coverage Diff @@
## master #55924 +/- ##
========================================
Coverage 79.94% 79.95%
========================================
Files 5061 5061
Lines 217361 217468 +107
Branches 36786 36806 +20
========================================
+ Hits 173778 173874 +96
- Misses 38258 38267 +9
- Partials 5325 5327 +2
|
check_missing
task to each monitor_environment
logger.info( | ||
"monitor.missed-checkin", extra={"monitor_environment_id": monitor_environment.id} | ||
) | ||
check_missing_environment.delay(monitor_environment.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a celery expert but what's the overhead of a task here? would it make sense at all to batch them?
is the overall goal of this PR simply to increase parallelism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we're only at a few hundred but could always switch to chunks later
https://docs.celeryq.dev/en/latest/userguide/canvas.html#chunks
src/sentry/monitors/tasks.py
Outdated
monitor_environment = MonitorEnvironment.objects.get(id=monitor_environment_id) | ||
monitor = monitor_environment.monitor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add select_related('monitor')
so we don't have to make a second query here.
This reverts commit 2e0a9a9.
Let's give it a shot, but if possible let's keep an eye on when the tasks are being run. |
PR reverted: c738ff5 |
Currently this task has a time-limit of 15s, this can be very sensitive to database issues where things "slow down" and the task hits it's time-limit.
This task currently works by finding all monitors that are past their expected check-in time and updates and marks the monitor as having missed a check-in. We can improve performance here by fanning out tasks for each monitor that it needs to mark as missed.
Currently we average around ~250 missed monitors per minute. This number will only grow linearly as the product usage grows. This would be an extra 250 tasks to process each minute.