-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(spans): Batch insert span data to spanstore #55980
Conversation
Codecov Report
@@ Coverage Diff @@
## master #55980 +/- ##
===========================================
- Coverage 79.95% 61.98% -17.97%
===========================================
Files 5060 5050 -10
Lines 217568 217346 -222
Branches 36831 36800 -31
===========================================
- Hits 173952 134718 -39234
- Misses 38287 78240 +39953
+ Partials 5329 4388 -941
|
@@ -2051,6 +2051,10 @@ def SOCIAL_AUTH_DEFAULT_USERNAME() -> str: | |||
SENTRY_TAGSTORE = os.environ.get("SENTRY_TAGSTORE", "sentry.tagstore.snuba.SnubaTagStorage") | |||
SENTRY_TAGSTORE_OPTIONS: dict[str, Any] = {} | |||
|
|||
# Node storage backend used for ArtifactBundle indexing (aka FlatFileIndex aka BundleIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Copy/paste error
bigtable_project_id: str, | ||
bigtable_instance_id: str, | ||
bigtable_table_id: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably not provide these parameters here. Instead we can rely on the SENTRY_SPANSTORE
setting and the corresponding SENTRY_SPANSTORE_OPTIONS
to populate spanstore specific configurations. That way the solution works for self hosted as well.
if batcher_options := self.client_options.get("batcher"): | ||
self.__batcher = self.__table.mutations_batcher(**batcher_options) | ||
self._mutate_rows = self.__batcher.mutate_rows | ||
self._mutate = self.__batcher.mutate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does the batcher gets flushed? The default configuration seems to suggest 1 second.
One possible idea to explore is making the batch flush explicit rather than implicit. So there could be a batcher step as the next step of the RunTaskWithMultiprocessing
. The batcher would explicitly fill in bigtable's batcher until a threshold. Once the batch finishes, it will perform a flush to bigtable. The next step after batcher would be unfold which would produce each individual message on the snuba-spans topic.
There are a few advantages of the above approach.
- Explicit writes instead of implicit time based writes which gives stronger data durability guarantees.
- In case of the consumers crashing, there would not be a possibility of losing context/tags data from bigtable and having the indexed spans data in clickhouse. This could currently happen because of asynchronous writes.
This issue has gone three weeks without activity. In another week, I will close it. But! If you comment or otherwise update it, I will reset the clock, and if you remove the label "A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀 |
We want to store some span data in a cheaper datastore. Given the access patterns, Bigtable was chosen.
This PR will:
spanstore
based on thenodestore
implementation