feat(spans): Batch insert span data to spanstore #55980

phacops · 2023-09-11T10:13:40Z

We want to store some span data in a cheaper datastore. Given the access patterns, Bigtable was chosen.

This PR will:

add support for our Bigtable KV store to batch inserts
create a new store we'll call spanstore based on the nodestore implementation
store span data in the span consumer

codecov · 2023-09-11T13:09:20Z

Codecov Report

Merging #55980 (ffe53dc) into master (9679974) will decrease coverage by 17.97%.
Report is 4 commits behind head on master.
The diff coverage is 31.25%.

@@             Coverage Diff             @@
##           master   #55980       +/-   ##
===========================================
- Coverage   79.95%   61.98%   -17.97%     
===========================================
  Files        5060     5050       -10     
  Lines      217568   217346      -222     
  Branches    36831    36800       -31     
===========================================
- Hits       173952   134718    -39234     
- Misses      38287    78240    +39953     
+ Partials     5329     4388      -941

Files Changed	Coverage
src/sentry/utils/kvstore/bigtable.py	`15.78%`
src/sentry/spans/consumers/process/processor.py	`27.55%`
src/sentry/spans/consumers/process/factory.py	`75.00%`
src/sentry/conf/server.py	`100.00%`
src/sentry/spans/store/__init__.py	`100.00%`

nikhars · 2023-09-11T18:18:31Z

src/sentry/conf/server.py

@@ -2051,6 +2051,10 @@ def SOCIAL_AUTH_DEFAULT_USERNAME() -> str:
 SENTRY_TAGSTORE = os.environ.get("SENTRY_TAGSTORE", "sentry.tagstore.snuba.SnubaTagStorage")
 SENTRY_TAGSTORE_OPTIONS: dict[str, Any] = {}

+# Node storage backend used for ArtifactBundle indexing (aka FlatFileIndex aka BundleIndex)


Nit: Copy/paste error

nikhars · 2023-09-11T21:07:58Z

src/sentry/spans/consumers/process/processor.py

+        bigtable_project_id: str,
+        bigtable_instance_id: str,
+        bigtable_table_id: str,


We should probably not provide these parameters here. Instead we can rely on the SENTRY_SPANSTORE setting and the corresponding SENTRY_SPANSTORE_OPTIONS to populate spanstore specific configurations. That way the solution works for self hosted as well.

nikhars · 2023-09-11T21:52:00Z

src/sentry/utils/kvstore/bigtable.py

+        if batcher_options := self.client_options.get("batcher"):
+            self.__batcher = self.__table.mutations_batcher(**batcher_options)
+            self._mutate_rows = self.__batcher.mutate_rows
+            self._mutate = self.__batcher.mutate


When does the batcher gets flushed? The default configuration seems to suggest 1 second.

One possible idea to explore is making the batch flush explicit rather than implicit. So there could be a batcher step as the next step of the RunTaskWithMultiprocessing. The batcher would explicitly fill in bigtable's batcher until a threshold. Once the batch finishes, it will perform a flush to bigtable. The next step after batcher would be unfold which would produce each individual message on the snuba-spans topic.

There are a few advantages of the above approach.

Explicit writes instead of implicit time based writes which gives stronger data durability guarantees.

In case of the consumers crashing, there would not be a possibility of losing context/tags data from bigtable and having the indexed spans data in clickhouse. This could currently happen because of asynchronous writes.

getsantry · 2023-10-19T07:00:52Z

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

feat(spans): Batch insert span data to SpanStore

51bea2a

phacops requested review from jjbayer and nikhars September 11, 2023 10:13

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Sep 11, 2023

Fix typo

f5143b4

vercel bot deployed to Preview September 11, 2023 12:33 View deployment

Remove set_many/set_multi

9ef7cef

vercel bot deployed to Preview September 11, 2023 12:44 View deployment

Fix typing issues

ffe53dc

vercel bot deployed to Preview September 11, 2023 12:48 View deployment

nikhars reviewed Sep 11, 2023

View reviewed changes

getsantry bot added the Stale label Oct 19, 2023

getsantry bot closed this Oct 27, 2023

github-actions bot locked and limited conversation to collaborators Nov 11, 2023

phacops reopened this Nov 27, 2023

phacops closed this Nov 27, 2023

phacops deleted the pierre/spans-batch-insert-to-nodestore branch January 10, 2024 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(spans): Batch insert span data to spanstore #55980

feat(spans): Batch insert span data to spanstore #55980

phacops commented Sep 11, 2023

codecov bot commented Sep 11, 2023 •

edited

Loading

nikhars Sep 11, 2023

nikhars Sep 11, 2023

nikhars Sep 11, 2023

getsantry bot commented Oct 19, 2023

feat(spans): Batch insert span data to spanstore #55980

feat(spans): Batch insert span data to spanstore #55980

Conversation

phacops commented Sep 11, 2023

codecov bot commented Sep 11, 2023 • edited Loading

Codecov Report

nikhars Sep 11, 2023

Choose a reason for hiding this comment

nikhars Sep 11, 2023

Choose a reason for hiding this comment

nikhars Sep 11, 2023

Choose a reason for hiding this comment

getsantry bot commented Oct 19, 2023

codecov bot commented Sep 11, 2023 •

edited

Loading