New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(crons): Correct logic for mark_ok #57730

Merged

evanpurkhiser merged 1 commit into master from evanpurkhiser/ref-crons-correct-logic-for-mark-ok

Oct 10, 2023

Member

evanpurkhiser commented Oct 6, 2023

Previously when incidents were enabled (a recovery threshold was set) a
monitor would not have it's next_checkin or next_checkin_latest
updated until it recovered.

evanpurkhiser requested a review from a team as a code owner

October 6, 2023 21:52

evanpurkhiser requested a review from davidenwang

October 6, 2023 21:52

github-actions bot added the Scope: Backend label

evanpurkhiser mentioned this pull request

ref(crons): Make test_mark_ok more unit-test-y #57702

Closed

evanpurkhiser commented

View reviewed changes

tests/sentry/monitors/logic/test_mark_ok.py

Comment on lines -22 to -23

 @with_feature("organizations:issue-platform")

 @patch("sentry.issues.producer.produce_occurrence_to_kafka")

Member Author

evanpurkhiser Oct 6, 2023

I've made this more unit-test like by removing the dependency on mark_failed from this test.

vercel bot deployed to Preview

October 6, 2023 21:55

View deployment

davidenwang reviewed

View reviewed changes

src/sentry/monitors/logic/mark_ok.py Outdated

 "-date_added"

 )[:recovery_threshold]

 # Incident recovers when ew have successive threshold check-ins

Contributor

davidenwang Oct 6, 2023

nit: we*

src/sentry/monitors/logic/mark_ok.py Outdated

 # is not recovering

 allow_status_update = True

 # Resolve the incident if we have met the recovery recovery_threshold

Contributor

davidenwang Oct 6, 2023

should this just be if we have met the recovery_threshold?

Member Author

evanpurkhiser Oct 6, 2023

yeah

src/sentry/monitors/logic/mark_ok.py Outdated

 )

 # Resolve the incident

 if incident_recovering and monitor_env.status != MonitorStatus.OK:

Contributor

davidenwang Oct 6, 2023 •

edited

Loading

could we simplify the logic/execution overall here? Right now it seems like the order of conditions are:

Check if we have a recovery threshold (if not just update the status)
If we have, fetch N recent check-ins and check that they are all ok
If they are all ok check to see if our monitor is in an incident (!= MonitorStatus.OK)
If true, resolve the incident, and then allow a status update on the monitor

But would it be better to instead

Check monitor_env.status != MonitorStatus.OK first, if it IS OK then we can do nothing
If not OK then chcek for a recovery threshold, proceed with the previous steps 2 and then 4

Only mentioning this because hopefully the majority of the time, user's monitor statuses should be OK which means we shouldn't check the N most recent check-ins every time they send an OK check-in (assuming they have a recovery threshold)

evanpurkhiser force-pushed the evanpurkhiser/ref-crons-correct-logic-for-mark-ok branch from 481f494 to 63fc2e1 Compare

October 6, 2023 22:16

vercel bot deployed to Preview

October 6, 2023 22:18

View deployment

evanpurkhiser force-pushed the evanpurkhiser/ref-crons-correct-logic-for-mark-ok branch from 63fc2e1 to 57ebe7e Compare

October 10, 2023 16:59

vercel bot deployed to Preview

October 10, 2023 17:01

View deployment


ref(crons): Correct logic for mark_ok

Previously when incidents were enabled (a recovery threshold was set) a
monitor would not have it's `next_checkin` or `next_checkin_latest`
updated until it recovered.

evanpurkhiser force-pushed the evanpurkhiser/ref-crons-correct-logic-for-mark-ok branch from 57ebe7e to 5893654 Compare

October 10, 2023 18:36

vercel bot deployed to Preview

October 10, 2023 18:38

View deployment

rjo100 reviewed

View reviewed changes

src/sentry/monitors/logic/mark_ok.py

+ )
+ return
+ recovery_threshold = monitor_env.monitor.config.get("recovery_threshold", 0)

Contributor

rjo100 Oct 10, 2023

this should be 1 as the default? or idk maybe a comment. the min value is 1

Member Author

evanpurkhiser Oct 10, 2023

I think you have the default here as 0 since if it's not set we don't want to enable incidents yet

src/sentry/monitors/logic/mark_ok.py

+ return
+ recovery_threshold = monitor_env.monitor.config.get("recovery_threshold", 0)
+ using_incidents = bool(recovery_threshold)

Contributor

rjo100 Oct 10, 2023

maybe just change this to recovery_threshold > 1 which is functionally the same thing for now

Member Author

evanpurkhiser Oct 10, 2023

Yeah I basically just duplicated what the logic was before if recovery_threshold:

Member Author

evanpurkhiser commented Oct 10, 2023

@rjo100 mind just approving as is and we can clean up after?

evanpurkhiser enabled auto-merge (squash)

October 10, 2023 19:51

rjo100 approved these changes

View reviewed changes

evanpurkhiser merged commit dcd6da4 into master

49 checks passed

evanpurkhiser deleted the evanpurkhiser/ref-crons-correct-logic-for-mark-ok branch

October 10, 2023 20:00

sentry-io bot commented Oct 10, 2023

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

‼️ OperationalError: QueryCanceled('canceling statement due to user request\n') monitors.monitor_consumer View Issue

_{Did you find this useful? React with a 👍 or 👎}

evanpurkhiser added the Trigger: Revert label

Contributor

getsentry-bot commented Oct 10, 2023

PR reverted: 8aa99c7

getsentry-bot added a commit that referenced this pull request


Revert "ref(crons): Correct logic for mark_ok (#57730)"

8aa99c7

This reverts commit dcd6da4.

Co-authored-by: evanpurkhiser <1421724+evanpurkhiser@users.noreply.github.com>

rjo100 mentioned this pull request

fix(crons): Properly update monitorenvs in mark_ok with thresholds #57955

Merged

github-actions bot locked and limited conversation to collaborators

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.