Process delayed alert conditions in batches of 10,000 #75302

saponifi3d · 2024-07-30T21:44:44Z

Description

Some orgs are sending 100k+ events per minute, and the processing is taking to long for a single task.

This PR will look at the size of the hash and determine if it needs to be batched.

There's some restrictions around the celery task / redis, info is outlined in a code comment here: https://github.com/getsentry/sentry/pull/75302/files#diff-f906e75a0e4419db4870fa45ca5a1608ca79beaa052c8bc50b4805607a665d27R482-R486

codecov · 2024-07-30T22:18:08Z

Codecov Report

Attention: Patch coverage is 94.44444% with 2 lines in your changes missing coverage. Please review.

Project coverage is 78.21%. Comparing base (b51c4e2) to head (c29c5e3).
Report is 19 commits behind head on master.

Files	Patch %	Lines
src/sentry/rules/processing/delayed_processing.py	92.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           master   #75302    +/-   ##
========================================
  Coverage   78.21%   78.21%            
========================================
  Files        6786     6787     +1     
  Lines      302382   302486   +104     
  Branches    52035    52050    +15     
========================================
+ Hits       236496   236600   +104     
+ Misses      59523    59521     -2     
- Partials     6363     6365     +2

Files	Coverage Δ
src/sentry/buffer/base.py	`86.00% <100.00%> (+0.58%)`	⬆️
src/sentry/buffer/redis.py	`88.49% <100.00%> (+0.30%)`	⬆️
src/sentry/options/defaults.py	`100.00% <100.00%> (ø)`
src/sentry/rules/processing/delayed_processing.py	`92.24% <92.00%> (+0.81%)`	⬆️

... and 38 files with indirect coverage changes

- look at the size of the hash - if under the limit - parse the groups as we were - else (over the limit) - break into batches of 10,000. Store new batches in redis.

…ush_to_hash there

…egroup data to that function

wedamija

Nice fix, I was concerned that this batching might end up pretty complex, but this is easy to reason about

wedamija · 2024-07-31T20:25:42Z

src/sentry/rules/processing/delayed_processing.py

+    uniqueness across all of them for the centralized redis buffer. The batches are stored in redis because
+    we shouldn't pass complex objects in the celery task arguments, and we can't send a page of data in the


Jfyi: Mostly we don't like to pass things that would have to be pickled, like Django models etc. That said, this approach is probably better because I'm not sure if there are downsides to storing 10k ints per task in rabbitmq

src/sentry/rules/processing/delayed_processing.py

wedamija · 2024-07-31T21:46:07Z

src/sentry/rules/processing/delayed_processing.py

@@ -47,7 +47,7 @@
 logger = logging.getLogger("sentry.rules.delayed_processing")
 EVENT_LIMIT = 100
 COMPARISON_INTERVALS_VALUES = {k: v[1] for k, v in COMPARISON_INTERVALS.items()}
-CHUNK_BATCH_SIZE = 10000
+CHUNK_BATCH_SIZE = options.get("delayed_processing.batch_size")


We can't set this as a module variable - it'll never update (and might cause problems on load). We just need to perform the check in function

heh, ty, i was just about to ask to check if did that right. thanks!

okay, just updated again, anything else i'm missing with the options? (I was looking at the docs here: https://develop.sentry.dev/backend/options/ to add it). ps, thanks for the help!

… using the test helper instead

wedamija · 2024-07-31T21:58:15Z

src/sentry/options/defaults.py

+register(
+    "delayed_processing.batch_size",
+    default=10000,
+)


You just need to add flags=FLAG_AUTOMATOR_MODIFIABLE and then you'll be good to go

schew2381

One thing is in appled_delayed, I think we need to edit cleanup_redis_buffer b/c it uses the project_id when deleting the hash values rather than the generated uuid

schew2381 · 2024-07-31T21:46:43Z

src/sentry/buffer/redis.py

@@ -86,8 +86,10 @@ class RedisOperation(Enum):
    SORTED_SET_GET_RANGE = "zrangebyscore"
    SORTED_SET_DELETE_RANGE = "zremrangebyscore"
    HASH_ADD = "hset"
+    HASH_ADD_BULK = "hmset"


According to https://redis.io/docs/latest/commands/hmset/, this command is deprecated after 4.0 and can be replaced with hset which can take multiple key value pairs.
I'm not what version of redis we use tho

yeah, i had initially tried the hset with mapping syntax and it threw errors, so we'll need to use hmset until then.

ah ok, could we add a comment somewhere then referencing the future deprecation?

src/sentry/rules/processing/delayed_processing.py

schew2381 · 2024-07-31T21:51:26Z

src/sentry/rules/processing/delayed_processing.py

-def fetch_rulegroup_to_event_data(project_id: int) -> dict[str, str]:
-    return buffer.backend.get_hash(model=Project, field={"project_id": project_id})
+def fetch_rulegroup_to_event_data(project_id: int, batch_key: str | None = None) -> dict[str, str]:
+    field: dict[str, models.Model | int | str] = {


Where does this models.Model typing come from on the key 🤔

It's like a weird nested definition on the function call to get_hash for the field variable.

ideally i could not type this at all, and mypy would evaluate the field data type as dict[str, int | str] which adheres to dict[str, models.Model | int | str] (at least, that's how typescript works). unfortunately, it was throwing errors and if i typed it explicitly it threw the same errors.. since the dict does adhere to the type definition with models.Model, i just added it to appease the mypy overlords.

any recommendations on cleanup or better ways to appease the mypy?

that makes sense, I would think the implicit typing would work but I guess not lol
mypy is just a mystery sometimes 🔍

saponifi3d · 2024-07-31T22:07:23Z

One thing is in appled_delayed, I think we need to edit cleanup_redis_buffer b/c it uses the project_id when deleting the hash values rather than the generated uuid

@schew2381 🙏 💯 thanks for catching that. i'll update to pass the batch_key as well!

schew2381

overall lgtm!

schew2381 · 2024-07-31T23:14:38Z

tests/sentry/rules/processing/test_delayed_processing.py

+        mock_delayed = Mock()
+        mock_apply_delayed.delayed = mock_delayed
+        process_rulegroups_in_batches(self.project.id)
+
+        mock_delayed.assert_called_once_with(self.project.id)


nit: I think any calls on mocks generate more mocks, so you can do:

Suggested change

mock_delayed = Mock()

mock_apply_delayed.delayed = mock_delayed

process_rulegroups_in_batches(self.project.id)

mock_delayed.assert_called_once_with(self.project.id)

process_rulegroups_in_batches(self.project.id)

mock_apply_delayed.delayed.assert_called_once_with(self.project.id)

getsentry-bot · 2024-08-01T02:17:13Z

PR reverted: efe481e

This reverts commit 215491d. Co-authored-by: saponifi3d <1569818+saponifi3d@users.noreply.github.com>

This reverts commit efe481e.

# Description Revert of Revert for: #75302 `.delayed` vs `.delay` 🤦‍♂️ all pertinent changes are in: 482439a

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jul 30, 2024

saponifi3d force-pushed the jcallender/delayed_processor_10k_groups branch 2 times, most recently from b98edae to 4788057 Compare July 31, 2024 18:38

vercel bot deployed to Preview July 31, 2024 18:41 View deployment

vercel bot deployed to Preview July 31, 2024 18:48 View deployment

vercel bot deployed to Preview July 31, 2024 19:22 View deployment

vercel bot deployed to Preview July 31, 2024 19:37 View deployment

saponifi3d marked this pull request as ready for review July 31, 2024 19:39

saponifi3d requested a review from a team as a code owner July 31, 2024 19:39

saponifi3d requested review from wedamija and a team and removed request for a team July 31, 2024 19:39

vercel bot deployed to Preview July 31, 2024 19:43 View deployment

saponifi3d force-pushed the jcallender/delayed_processor_10k_groups branch from 8b76d57 to 77ed8cf Compare July 31, 2024 19:52

saponifi3d added 10 commits July 31, 2024 12:53

WIP

79ba7d0

Add the ability to batch the apply delayed function.

1ed44ad

- look at the size of the hash - if under the limit - parse the groups as we were - else (over the limit) - break into batches of 10,000. Store new batches in redis.

WIP

70cbbb4

move mock redis buffer into the create event test case, and provide p…

5225441

…ush_to_hash there

Cleanup the code, add some test cases around single / batch execution

8b20402

raise a not implemented error

7aafed8

add a docstring tot he process grops, move logic for fetching teh rul…

cdf288f

…egroup data to that function

i am the walrus

d39202e

fix comment

9af164b

-1 line changed

c94c18e

saponifi3d force-pushed the jcallender/delayed_processor_10k_groups branch from 77ed8cf to c94c18e Compare July 31, 2024 19:53

vercel bot deployed to Preview July 31, 2024 19:56 View deployment

add hmset and use bulk updating to redis

41fbcb7

vercel bot deployed to Preview July 31, 2024 20:12 View deployment

add tests for the all the new buffer code

c1a7982

vercel bot deployed to Preview July 31, 2024 20:30 View deployment

wedamija approved these changes Jul 31, 2024

View reviewed changes

use options automator to store the batch size

b767878

wedamija reviewed Jul 31, 2024

View reviewed changes

vercel bot deployed to Preview July 31, 2024 21:47 View deployment

saponifi3d added 2 commits July 31, 2024 14:51

move from a module vairable to getting it from the options automator.…

267d71d

… using the test helper instead

make the comment _better_

c29c5e3

vercel bot deployed to Preview July 31, 2024 21:56 View deployment

wedamija reviewed Jul 31, 2024

View reviewed changes

schew2381 reviewed Jul 31, 2024

View reviewed changes

wedamija approved these changes Jul 31, 2024

View reviewed changes

make it modifiable

babca17

update cleanup_redis_buffers to remove batched keys as well

1cd92fe

vercel bot deployed to Preview July 31, 2024 22:42 View deployment

add a note about hmset v hset

834445d

vercel bot deployed to Preview July 31, 2024 22:53 View deployment

schew2381 approved these changes Jul 31, 2024

View reviewed changes

remove all these extra mocks

4bac969

saponifi3d enabled auto-merge (squash) July 31, 2024 23:26

vercel bot deployed to Preview July 31, 2024 23:28 View deployment

saponifi3d merged commit 215491d into master Jul 31, 2024
49 checks passed

saponifi3d deleted the jcallender/delayed_processor_10k_groups branch July 31, 2024 23:56

saponifi3d added the Trigger: Revert Add to a merged PR to revert it (skips CI) label Aug 1, 2024

getsentry-bot added a commit that referenced this pull request Aug 1, 2024

Revert "Process delayed alert conditions in batches of 10,000 (#75302)"

efe481e

This reverts commit 215491d. Co-authored-by: saponifi3d <1569818+saponifi3d@users.noreply.github.com>

saponifi3d added a commit that referenced this pull request Aug 1, 2024

Reapply "Process delayed alert conditions in batches of 10,000 (#75302)"

44b965c

This reverts commit efe481e.

saponifi3d mentioned this pull request Aug 1, 2024

Revert the revert for https://github.com/getsentry/sentry/pull/75302 #75412

Merged

saponifi3d added a commit that referenced this pull request Aug 1, 2024

Revert the revert for #75302 (#75412)

dc5d329

# Description Revert of Revert for: #75302 `.delayed` vs `.delay` 🤦‍♂️ all pertinent changes are in: 482439a

github-actions bot locked and limited conversation to collaborators Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process delayed alert conditions in batches of 10,000 #75302

Process delayed alert conditions in batches of 10,000 #75302

saponifi3d commented Jul 30, 2024 •

edited

Loading

codecov bot commented Jul 30, 2024 •

edited

Loading

wedamija left a comment

wedamija Jul 31, 2024

wedamija Jul 31, 2024

saponifi3d Jul 31, 2024

saponifi3d Jul 31, 2024

wedamija Jul 31, 2024

schew2381 left a comment

schew2381 Jul 31, 2024

saponifi3d Jul 31, 2024

schew2381 Jul 31, 2024

schew2381 Jul 31, 2024

saponifi3d Jul 31, 2024

schew2381 Jul 31, 2024

saponifi3d commented Jul 31, 2024

schew2381 left a comment

schew2381 Jul 31, 2024

getsentry-bot commented Aug 1, 2024

		uniqueness across all of them for the centralized redis buffer. The batches are stored in redis because
		we shouldn't pass complex objects in the celery task arguments, and we can't send a page of data in the

Process delayed alert conditions in batches of 10,000 #75302

Process delayed alert conditions in batches of 10,000 #75302

Conversation

saponifi3d commented Jul 30, 2024 • edited Loading

Description

codecov bot commented Jul 30, 2024 • edited Loading

Codecov Report

wedamija left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schew2381 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saponifi3d commented Jul 31, 2024

schew2381 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

getsentry-bot commented Aug 1, 2024

saponifi3d commented Jul 30, 2024 •

edited

Loading

codecov bot commented Jul 30, 2024 •

edited

Loading