Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process delayed alert conditions in batches of 10,000 #75302

Merged
merged 20 commits into from
Jul 31, 2024

Conversation

saponifi3d
Copy link
Contributor

@saponifi3d saponifi3d commented Jul 30, 2024

Description

Some orgs are sending 100k+ events per minute, and the processing is taking to long for a single task.

This PR will look at the size of the hash and determine if it needs to be batched.

There's some restrictions around the celery task / redis, info is outlined in a code comment here: https://github.com/getsentry/sentry/pull/75302/files#diff-f906e75a0e4419db4870fa45ca5a1608ca79beaa052c8bc50b4805607a665d27R482-R486

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jul 30, 2024
Copy link

codecov bot commented Jul 30, 2024

Codecov Report

Attention: Patch coverage is 94.44444% with 2 lines in your changes missing coverage. Please review.

Project coverage is 78.21%. Comparing base (b51c4e2) to head (c29c5e3).
Report is 19 commits behind head on master.

Files Patch % Lines
src/sentry/rules/processing/delayed_processing.py 92.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master   #75302    +/-   ##
========================================
  Coverage   78.21%   78.21%            
========================================
  Files        6786     6787     +1     
  Lines      302382   302486   +104     
  Branches    52035    52050    +15     
========================================
+ Hits       236496   236600   +104     
+ Misses      59523    59521     -2     
- Partials     6363     6365     +2     
Files Coverage Δ
src/sentry/buffer/base.py 86.00% <100.00%> (+0.58%) ⬆️
src/sentry/buffer/redis.py 88.49% <100.00%> (+0.30%) ⬆️
src/sentry/options/defaults.py 100.00% <100.00%> (ø)
src/sentry/rules/processing/delayed_processing.py 92.24% <92.00%> (+0.81%) ⬆️

... and 38 files with indirect coverage changes

Copy link
Member

@wedamija wedamija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix, I was concerned that this batching might end up pretty complex, but this is easy to reason about

Comment on lines 484 to 485
uniqueness across all of them for the centralized redis buffer. The batches are stored in redis because
we shouldn't pass complex objects in the celery task arguments, and we can't send a page of data in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jfyi: Mostly we don't like to pass things that would have to be pickled, like Django models etc. That said, this approach is probably better because I'm not sure if there are downsides to storing 10k ints per task in rabbitmq

src/sentry/rules/processing/delayed_processing.py Outdated Show resolved Hide resolved
@@ -47,7 +47,7 @@
logger = logging.getLogger("sentry.rules.delayed_processing")
EVENT_LIMIT = 100
COMPARISON_INTERVALS_VALUES = {k: v[1] for k, v in COMPARISON_INTERVALS.items()}
CHUNK_BATCH_SIZE = 10000
CHUNK_BATCH_SIZE = options.get("delayed_processing.batch_size")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't set this as a module variable - it'll never update (and might cause problems on load). We just need to perform the check in function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, ty, i was just about to ask to check if did that right. thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, just updated again, anything else i'm missing with the options? (I was looking at the docs here: https://develop.sentry.dev/backend/options/ to add it). ps, thanks for the help!

Comment on lines 2639 to 2642
register(
"delayed_processing.batch_size",
default=10000,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just need to add flags=FLAG_AUTOMATOR_MODIFIABLE and then you'll be good to go

Copy link
Member

@schew2381 schew2381 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing is in appled_delayed, I think we need to edit cleanup_redis_buffer b/c it uses the project_id when deleting the hash values rather than the generated uuid

@@ -86,8 +86,10 @@ class RedisOperation(Enum):
SORTED_SET_GET_RANGE = "zrangebyscore"
SORTED_SET_DELETE_RANGE = "zremrangebyscore"
HASH_ADD = "hset"
HASH_ADD_BULK = "hmset"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://redis.io/docs/latest/commands/hmset/, this command is deprecated after 4.0 and can be replaced with hset which can take multiple key value pairs.
I'm not what version of redis we use tho

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i had initially tried the hset with mapping syntax and it threw errors, so we'll need to use hmset until then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, could we add a comment somewhere then referencing the future deprecation?

src/sentry/rules/processing/delayed_processing.py Outdated Show resolved Hide resolved
def fetch_rulegroup_to_event_data(project_id: int) -> dict[str, str]:
return buffer.backend.get_hash(model=Project, field={"project_id": project_id})
def fetch_rulegroup_to_event_data(project_id: int, batch_key: str | None = None) -> dict[str, str]:
field: dict[str, models.Model | int | str] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this models.Model typing come from on the key 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's like a weird nested definition on the function call to get_hash for the field variable.

ideally i could not type this at all, and mypy would evaluate the field data type as dict[str, int | str] which adheres to dict[str, models.Model | int | str] (at least, that's how typescript works). unfortunately, it was throwing errors and if i typed it explicitly it threw the same errors.. since the dict does adhere to the type definition with models.Model, i just added it to appease the mypy overlords.

any recommendations on cleanup or better ways to appease the mypy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, I would think the implicit typing would work but I guess not lol
mypy is just a mystery sometimes 🔍

@saponifi3d
Copy link
Contributor Author

One thing is in appled_delayed, I think we need to edit cleanup_redis_buffer b/c it uses the project_id when deleting the hash values rather than the generated uuid

@schew2381 🙏 💯 thanks for catching that. i'll update to pass the batch_key as well!

Copy link
Member

@schew2381 schew2381 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm!

Comment on lines 1347 to 1351
mock_delayed = Mock()
mock_apply_delayed.delayed = mock_delayed
process_rulegroups_in_batches(self.project.id)

mock_delayed.assert_called_once_with(self.project.id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think any calls on mocks generate more mocks, so you can do:

Suggested change
mock_delayed = Mock()
mock_apply_delayed.delayed = mock_delayed
process_rulegroups_in_batches(self.project.id)
mock_delayed.assert_called_once_with(self.project.id)
process_rulegroups_in_batches(self.project.id)
mock_apply_delayed.delayed.assert_called_once_with(self.project.id)

@saponifi3d saponifi3d enabled auto-merge (squash) July 31, 2024 23:26
@saponifi3d saponifi3d merged commit 215491d into master Jul 31, 2024
49 checks passed
@saponifi3d saponifi3d deleted the jcallender/delayed_processor_10k_groups branch July 31, 2024 23:56
@saponifi3d saponifi3d added the Trigger: Revert add to a merged PR to revert it (skips CI) label Aug 1, 2024
@getsentry-bot
Copy link
Contributor

PR reverted: efe481e

getsentry-bot added a commit that referenced this pull request Aug 1, 2024
This reverts commit 215491d.

Co-authored-by: saponifi3d <1569818+saponifi3d@users.noreply.github.com>
saponifi3d added a commit that referenced this pull request Aug 1, 2024
saponifi3d added a commit that referenced this pull request Aug 1, 2024
# Description
Revert of Revert for: #75302 

`.delayed` vs `.delay` 🤦‍♂️ all pertinent changes are in:
482439a
@github-actions github-actions bot locked and limited conversation to collaborators Aug 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components Trigger: Revert add to a merged PR to revert it (skips CI)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants