feat(profiles): Add a dataset to store profile chunks #5895

phacops · 2024-05-12T14:27:40Z

This will add a new dataset for profile chunks in order to support our continuous profiling feature (receiving profile chunk by chunk and not one profile per transaction).

The SDK will start a profiler session, identified by profiler_id, will profile and send a chunk with containing this profile ID. It will also tag spans collected with this profiler ID.

Later on, to fetch a profile for a span, we'll receive a profile ID and start and stop timestamps for the span and, using this dataset, we'll query the chunk IDs necessary to assemble the profile for that span, with a query looking like this:

SELECT
    DISTINCT chunk_id
FROM profile_chunks
WHERE
    project_id = <project_id>
    AND profiler_id = <profiler_id>
    AND start_timestamp <= <span.start_timestamp>
    AND end_timestamp >= <span.end_timestamp>
ORDER BY start_timestamp;

Since all our queries will contain a date range on both start_timestamp and end_timestamp, I think it's useful to have them in the sort key. chunk_id appears there in order to make it work with a ReplacingMergeTree since this will guaranteed uniqueness of rows, even though having 2 chunks for the same profiler session and timestamps would be a bug.

We're using the DateTime64 type to be able to store sub-millisecond precision and now have 2 different fields. I add support for that field in #5896.

This PR (https://github.com/getsentry/ops/pull/10526) is related as it adds the necessary config for StorageSetKey.PROFILE_CHUNKS to the profiling cluster in every environment.

github-actions · 2024-05-12T14:45:41Z

This PR has a migration; here is the generated SQL

-- start migrations

-- forward migration profile_chunks : 0001_create_profile_chunks_table
Local op: CREATE TABLE IF NOT EXISTS profile_chunks_local (project_id UInt64, profiler_id UUID, chunk_id UUID, start_timestamp DateTime64(6) CODEC (DoubleDelta), end_timestamp DateTime64(6) CODEC (DoubleDelta), retention_days UInt16, partition UInt16, offset UInt64) ENGINE ReplicatedReplacingMergeTree('/clickhouse/tables/profile_chunks/{shard}/default/profile_chunks_local', '{replica}') ORDER BY (project_id, profiler_id, start_timestamp, cityHash64(chunk_id)) PARTITION BY (retention_days, toStartOfDay(start_timestamp)) SAMPLE BY cityHash64(chunk_id) TTL toDateTime(end_timestamp) + toIntervalDay(retention_days) SETTINGS index_granularity=8192;
Distributed op: CREATE TABLE IF NOT EXISTS profile_chunks_dist (project_id UInt64, profiler_id UUID, chunk_id UUID, start_timestamp DateTime64(6) CODEC (DoubleDelta), end_timestamp DateTime64(6) CODEC (DoubleDelta), retention_days UInt16, partition UInt16, offset UInt64) ENGINE Distributed(`cluster_one_sh`, default, profile_chunks_local, cityHash64(profiler_id));
-- end forward migration profile_chunks : 0001_create_profile_chunks_table




-- backward migration profile_chunks : 0001_create_profile_chunks_table
Distributed op: DROP TABLE IF EXISTS profile_chunks_dist;
Local op: DROP TABLE IF EXISTS profile_chunks_local;
-- end backward migration profile_chunks : 0001_create_profile_chunks_table

… TTL

snuba/snuba_migrations/profile_chunks/0001_create_profile_chunks_table.py

Zylphrex

This looks good to me 👍

snuba/snuba_migrations/profile_chunks/0001_create_profile_chunks_table.py

volokluev

A few questions

snuba/snuba_migrations/profile_chunks/0001_create_profile_chunks_table.py

This reverts commit 0a5af69.

…luster" This reverts commit e4112c9.

codecov · 2024-05-14T14:43:40Z

Test Failures Detected: Due to failing tests, we cannot provide coverage reports at this time.

❌ Failed Test Results:

Completed 286 tests with 1 failed, 284 passed and 1 skipped.

View the full list of failed tests

Test Description

Failure message

Testsuite:
pytest

Test name:
tests.test_metrics_summaries_api.TestMetricsSummariesApi::test_basic_query

Envs:
- default

Traceback (most recent call last):
  File ".../snuba/tests/test_metrics_summaries_api.py", line 125, in test_basic_query
    assert len(data["data"]) == 1, data
AssertionError: {'data': [], 'experiments': {}, 'meta': [{'name': 'project_id', 'type': 'UInt64'}, {'name': 'metric_mri', 'type': 'Str...rray(String)'}], 'profile': {'blocks': 0, 'bytes': 0, 'elapsed': 0.0032491683959960938, 'progress_bytes': 0, ...}, ...}
assert 0 == 1
 +  where 0 = len([])

enochtangg

Left a comment

enochtangg · 2024-05-14T15:53:55Z

snuba/migrations/groups.py

+    MigrationGroup.PROFILE_CHUNKS: _MigrationGroup(
+        loader=ProfileChunksLoader(),
+        storage_sets_keys={StorageSetKey.PROFILE_CHUNKS},
+        readiness_state=ReadinessState.PARTIAL,


Does the ClickHouse cluster already exist in SaaS (US + DE) and S4S? If the table is suppose to live the profiles cluster, has the storage_set been added there? If this isn't done yet, we should set the readiness state to ReadinessState.LIMITED for now so that GoCD doesn't try to run these migrations on a cluster that doesn't exist.

The ClickHouse cluster we'd use exists in all environments (us, de, s4s, all STs).

I also have https://github.com/getsentry/ops/pull/10526 to configure it for all environments.

which cluster are you using @phacops ?

The cluster dedicated to profiling.

getsentry-bot · 2024-05-15T22:51:06Z

PR reverted: 64c044e

This reverts commit 6dfa3a7. Co-authored-by: volokluev <3169433+volokluev@users.noreply.github.com>

phacops requested a review from Zylphrex May 12, 2024 14:27

phacops requested a review from a team as a code owner May 12, 2024 14:27

github-actions bot added the migrations label May 12, 2024

This was referenced May 12, 2024

feat: Implement profile chunk storage getsentry/vroom#463

Merged

feat(profiles): Send profile chunks to the chunk route on vroom getsentry/sentry#70714

Merged

feat: Add DateTime64 column type

b813c29

phacops force-pushed the pierre/continuous-profiling-dataset-migration branch from 34e88bb to 57c232b Compare May 12, 2024 14:55

phacops changed the base branch from master to pierre/add-datetime64-column-type May 12, 2024 14:55

phacops force-pushed the pierre/continuous-profiling-dataset-migration branch from 57c232b to 81a4671 Compare May 12, 2024 14:57

Add tests and validation

c08623f

phacops force-pushed the pierre/continuous-profiling-dataset-migration branch from 81a4671 to ba31b76 Compare May 12, 2024 15:28

phacops mentioned this pull request May 12, 2024

feat(profiles): Add dataset configurations and processor for profile chunks #5897

Merged

Add support for precision and timezone

9f5d4e1

phacops force-pushed the pierre/continuous-profiling-dataset-migration branch from 0dc7a81 to 45b9c76 Compare May 12, 2024 19:28

Fix typing

41b5051

phacops force-pushed the pierre/continuous-profiling-dataset-migration branch from 45b9c76 to 8d2643b Compare May 12, 2024 19:30

phacops added 4 commits May 12, 2024 16:14

Fix schema validation

efb88ce

feat(profiles): Add a dataset to store profile chunks

e0f5dd8

Fix typo in column name

3da03e1

Use microsecond precision

b9b425c

phacops force-pushed the pierre/continuous-profiling-dataset-migration branch from 8d2643b to b9b425c Compare May 12, 2024 20:37

phacops added 2 commits May 12, 2024 16:41

Fix DateTime64 argument name and convert end_timestamp to DateTime in…

95dac57

… TTL

Hash profiler_id in sort key as it's a UUID and we need integers

dbba7e4

viglia reviewed May 13, 2024

View reviewed changes

snuba/snuba_migrations/profile_chunks/0001_create_profile_chunks_table.py Outdated Show resolved Hide resolved

Zylphrex approved these changes May 13, 2024

View reviewed changes

volokluev reviewed May 13, 2024

View reviewed changes

Base automatically changed from pierre/add-datetime64-column-type to master May 13, 2024 19:32

phacops added 2 commits May 13, 2024 19:13

Use PROFILES storage set since it's going to be on the same cluster

e4112c9

Switch sample key to be based on chunk_id

e09551c

Remove unneeded constant

aaccc60

phacops requested review from volokluev, Zylphrex and viglia May 13, 2024 23:16

Fix wrong storage set for migration group

0a5af69

viglia reviewed May 14, 2024

View reviewed changes

snuba/snuba_migrations/profile_chunks/0001_create_profile_chunks_table.py Outdated Show resolved Hide resolved

phacops added 2 commits May 14, 2024 10:28

Revert "Fix wrong storage set for migration group"

fb419d4

This reverts commit 0a5af69.

Revert "Use PROFILES storage set since it's going to be on the same c…

6520246

…luster" This reverts commit e4112c9.

phacops added 2 commits May 14, 2024 10:54

Remove version column

82964dd

Merge branch 'master' into pierre/continuous-profiling-dataset-migration

3849033

phacops requested a review from viglia May 14, 2024 14:57

enochtangg reviewed May 14, 2024

View reviewed changes

phacops requested a review from enochtangg May 14, 2024 17:25

phacops added 2 commits May 14, 2024 13:26

Remove end_timestamp from the sort key

5ad30f7

Merge branch 'master' into pierre/continuous-profiling-dataset-migration

5f3f490

volokluev approved these changes May 15, 2024

View reviewed changes

phacops merged commit 6dfa3a7 into master May 15, 2024
30 checks passed

phacops deleted the pierre/continuous-profiling-dataset-migration branch May 15, 2024 20:44

volokluev added the Trigger: Revert label May 15, 2024

getsentry-bot added a commit that referenced this pull request May 15, 2024

Revert "feat(profiles): Add a dataset to store profile chunks (#5895)"

64c044e

This reverts commit 6dfa3a7. Co-authored-by: volokluev <3169433+volokluev@users.noreply.github.com>

phacops mentioned this pull request May 16, 2024

feat(profiles): Add a dataset to store profile chunks #5923

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(profiles): Add a dataset to store profile chunks #5895

feat(profiles): Add a dataset to store profile chunks #5895

phacops commented May 12, 2024 •

edited

Loading

github-actions bot commented May 12, 2024 •

edited

Loading

Zylphrex left a comment

volokluev left a comment

codecov bot commented May 14, 2024 •

edited

Loading

enochtangg left a comment

enochtangg May 14, 2024

phacops May 14, 2024 •

edited

Loading

volokluev May 14, 2024

volokluev May 15, 2024

phacops May 15, 2024

getsentry-bot commented May 15, 2024

feat(profiles): Add a dataset to store profile chunks #5895

feat(profiles): Add a dataset to store profile chunks #5895

Conversation

phacops commented May 12, 2024 • edited Loading

github-actions bot commented May 12, 2024 • edited Loading

Zylphrex left a comment

Choose a reason for hiding this comment

volokluev left a comment

Choose a reason for hiding this comment

codecov bot commented May 14, 2024 • edited Loading

❌ Failed Test Results:

enochtangg left a comment

Choose a reason for hiding this comment

enochtangg May 14, 2024

Choose a reason for hiding this comment

phacops May 14, 2024 • edited Loading

Choose a reason for hiding this comment

volokluev May 14, 2024

Choose a reason for hiding this comment

volokluev May 15, 2024

Choose a reason for hiding this comment

phacops May 15, 2024

Choose a reason for hiding this comment

getsentry-bot commented May 15, 2024

phacops commented May 12, 2024 •

edited

Loading

github-actions bot commented May 12, 2024 •

edited

Loading

codecov bot commented May 14, 2024 •

edited

Loading

phacops May 14, 2024 •

edited

Loading