Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(eap): rewrite the meta tables to have lower cardinality #6316

Merged
merged 3 commits into from
Sep 18, 2024

Conversation

colin-sentry
Copy link
Member

@colin-sentry colin-sentry commented Sep 17, 2024

We tried to improve performance of other queries with a subselect, but got disappointing results.
This change:

  • Drops trace_id from both tables
  • Removes attr_value from the number tables (we'd never want to autocomplete numbers)
  • Adds the normalized columns (name, segment_name, service, duration_ms, exclusive_time_ms) to both meta tables

Copy link

github-actions bot commented Sep 17, 2024

This PR has a migration; here is the generated SQL

-- start migrations

-- forward migration events_analytics_platform : 0014_span_attribute_table_smaller
Local op: DROP TABLE IF EXISTS spans_str_attrs_mv;
Local op: DROP TABLE IF EXISTS spans_num_attrs_mv;
Distributed op: DROP TABLE IF EXISTS spans_str_attrs_dist;
Distributed op: DROP TABLE IF EXISTS spans_num_attrs_dist;
Local op: DROP TABLE IF EXISTS spans_str_attrs_local;
Local op: DROP TABLE IF EXISTS spans_num_attrs_local;
Local op: CREATE TABLE IF NOT EXISTS spans_str_attrs_2_local (organization_id UInt64, project_id UInt64, attr_key String CODEC (ZSTD(1)), attr_value String CODEC (ZSTD(1)), timestamp DateTime CODEC (ZSTD(1)), retention_days UInt16, count SimpleAggregateFunction(sum, UInt64)) ENGINE ReplicatedAggregatingMergeTree('/clickhouse/tables/events_analytics_platform/{shard}/default/spans_str_attrs_2_local', '{replica}') PRIMARY KEY (organization_id, attr_key) ORDER BY (organization_id, attr_key, attr_value, project_id, timestamp, retention_days) PARTITION BY toMonday(timestamp) TTL timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=8192;
Distributed op: CREATE TABLE IF NOT EXISTS spans_str_attrs_2_dist (organization_id UInt64, project_id UInt64, attr_key String CODEC (ZSTD(1)), attr_value String CODEC (ZSTD(1)), timestamp DateTime CODEC (ZSTD(1)), retention_days UInt16, count SimpleAggregateFunction(sum, UInt64)) ENGINE Distributed(`cluster_one_sh`, default, spans_str_attrs_2_local);
Local op: CREATE MATERIALIZED VIEW IF NOT EXISTS spans_str_attrs_2_mv TO spans_str_attrs_2_local (organization_id UInt64, project_id UInt64, attr_key String CODEC (ZSTD(1)), attr_value String CODEC (ZSTD(1)), timestamp DateTime CODEC (ZSTD(1)), retention_days UInt16, count SimpleAggregateFunction(sum, UInt64)) AS 
SELECT
    organization_id,
    project_id,
    attrs.1 as attr_key,
    attrs.2 as attr_value,
    toStartOfDay(_sort_timestamp) AS timestamp,
    retention_days,
    1 AS count
FROM eap_spans_local
LEFT ARRAY JOIN
    arrayConcat(
        CAST(attr_str_0, 'Array(Tuple(String, String))'), CAST(attr_str_1, 'Array(Tuple(String, String))'), CAST(attr_str_2, 'Array(Tuple(String, String))'), CAST(attr_str_3, 'Array(Tuple(String, String))'), CAST(attr_str_4, 'Array(Tuple(String, String))'), CAST(attr_str_5, 'Array(Tuple(String, String))'), CAST(attr_str_6, 'Array(Tuple(String, String))'), CAST(attr_str_7, 'Array(Tuple(String, String))'), CAST(attr_str_8, 'Array(Tuple(String, String))'), CAST(attr_str_9, 'Array(Tuple(String, String))'), CAST(attr_str_10, 'Array(Tuple(String, String))'), CAST(attr_str_11, 'Array(Tuple(String, String))'), CAST(attr_str_12, 'Array(Tuple(String, String))'), CAST(attr_str_13, 'Array(Tuple(String, String))'), CAST(attr_str_14, 'Array(Tuple(String, String))'), CAST(attr_str_15, 'Array(Tuple(String, String))'), CAST(attr_str_16, 'Array(Tuple(String, String))'), CAST(attr_str_17, 'Array(Tuple(String, String))'), CAST(attr_str_18, 'Array(Tuple(String, String))'), CAST(attr_str_19, 'Array(Tuple(String, String))'),
        array(
            tuple('service', `service`),
            tuple('segment_name', `segment_name`),
            tuple('name', `name`)
        )
    ) AS attrs
GROUP BY
    organization_id,
    project_id,
    attr_key,
    attr_value,
    timestamp,
    retention_days
;
Local op: CREATE TABLE IF NOT EXISTS spans_num_attrs_2_local (organization_id UInt64, project_id UInt64, attr_key String, attr_min_value SimpleAggregateFunction(min, Float64), attr_max_value SimpleAggregateFunction(max, Float64), timestamp DateTime CODEC (ZSTD(1)), retention_days UInt16, count SimpleAggregateFunction(sum, UInt64)) ENGINE ReplicatedAggregatingMergeTree('/clickhouse/tables/events_analytics_platform/{shard}/default/spans_num_attrs_2_local', '{replica}') PRIMARY KEY (organization_id, attr_key) ORDER BY (organization_id, attr_key, attr_min_value, attr_max_value, timestamp, project_id, retention_days) PARTITION BY toMonday(timestamp) TTL timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=8192;
Distributed op: CREATE TABLE IF NOT EXISTS spans_num_attrs_2_dist (organization_id UInt64, project_id UInt64, attr_key String, attr_min_value SimpleAggregateFunction(min, Float64), attr_max_value SimpleAggregateFunction(max, Float64), timestamp DateTime CODEC (ZSTD(1)), retention_days UInt16, count SimpleAggregateFunction(sum, UInt64)) ENGINE Distributed(`cluster_one_sh`, default, spans_num_attrs_2_local);
Local op: CREATE MATERIALIZED VIEW IF NOT EXISTS spans_num_attrs_2_mv TO spans_num_attrs_2_local (organization_id UInt64, project_id UInt64, attr_key String, attr_min_value SimpleAggregateFunction(min, Float64), attr_max_value SimpleAggregateFunction(max, Float64), timestamp DateTime CODEC (ZSTD(1)), retention_days UInt16, count SimpleAggregateFunction(sum, UInt64)) AS 
SELECT
    organization_id,
    project_id,
    attrs.1 as attr_key,
    attrs.2 as attr_min_value,
    attrs.2 as attr_max_value,
    toStartOfDay(_sort_timestamp) AS timestamp,
    retention_days,
    1 AS count
FROM eap_spans_local
LEFT ARRAY JOIN
    arrayConcat(
        CAST(attr_num_0, 'Array(Tuple(String, Float64))'),CAST(attr_num_1, 'Array(Tuple(String, Float64))'),CAST(attr_num_2, 'Array(Tuple(String, Float64))'),CAST(attr_num_3, 'Array(Tuple(String, Float64))'),CAST(attr_num_4, 'Array(Tuple(String, Float64))'),CAST(attr_num_5, 'Array(Tuple(String, Float64))'),CAST(attr_num_6, 'Array(Tuple(String, Float64))'),CAST(attr_num_7, 'Array(Tuple(String, Float64))'),CAST(attr_num_8, 'Array(Tuple(String, Float64))'),CAST(attr_num_9, 'Array(Tuple(String, Float64))'),CAST(attr_num_10, 'Array(Tuple(String, Float64))'),CAST(attr_num_11, 'Array(Tuple(String, Float64))'),CAST(attr_num_12, 'Array(Tuple(String, Float64))'),CAST(attr_num_13, 'Array(Tuple(String, Float64))'),CAST(attr_num_14, 'Array(Tuple(String, Float64))'),CAST(attr_num_15, 'Array(Tuple(String, Float64))'),CAST(attr_num_16, 'Array(Tuple(String, Float64))'),CAST(attr_num_17, 'Array(Tuple(String, Float64))'),CAST(attr_num_18, 'Array(Tuple(String, Float64))'),CAST(attr_num_19, 'Array(Tuple(String, Float64))'),
        array(
            tuple('duration_ms', `duration_ms`::Float64)
        )
    ) AS attrs
GROUP BY
    organization_id,
    project_id,
    attrs.1,
    attrs.2,
    timestamp,
    retention_days
;
-- end forward migration events_analytics_platform : 0014_span_attribute_table_smaller




-- backward migration events_analytics_platform : 0014_span_attribute_table_smaller
Local op: DROP TABLE IF EXISTS spans_str_attrs_2_mv;
Local op: DROP TABLE IF EXISTS spans_str_attrs_2_local;
Distributed op: DROP TABLE IF EXISTS spans_str_attrs_2_dist;
Local op: DROP TABLE IF EXISTS spans_num_attrs_2_mv;
Local op: DROP TABLE IF EXISTS spans_num_attrs_2_local;
Distributed op: DROP TABLE IF EXISTS spans_num_attrs_2_dist;
-- end backward migration events_analytics_platform : 0014_span_attribute_table_smaller

Copy link
Member

@nikhars nikhars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration as its currently written would fail via CI/CD because there is a limit on how much data a table can have before it is allowed to be dropped. The setting is this https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings#max_table_size_to_drop

The setting in the ops repo is here https://github.com/getsentry/ops/blob/master/salt/states/clickhouse/files/config.xml#L16

You will need to work with a SRE to manually perform the drop table/truncate table before merging the PR.

@colin-sentry
Copy link
Member Author

The migration as its currently written would fail via CI/CD because there is a limit on how much data a table can have before it is allowed to be dropped. The setting is this https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings#max_table_size_to_drop

The setting in the ops repo is here https://github.com/getsentry/ops/blob/master/salt/states/clickhouse/files/config.xml#L16

You will need to work with a SRE to manually perform the drop table/truncate table before merging the PR.

There is an explicit truncate before dropping, is that not enough?

@nikhars
Copy link
Member

nikhars commented Sep 17, 2024

The migration as its currently written would fail via CI/CD because there is a limit on how much data a table can have before it is allowed to be dropped. The setting is this https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings#max_table_size_to_drop
The setting in the ops repo is here https://github.com/getsentry/ops/blob/master/salt/states/clickhouse/files/config.xml#L16
You will need to work with a SRE to manually perform the drop table/truncate table before merging the PR.

There is an explicit truncate before dropping, is that not enough?

Nope, it won't work. From the docs linked to clickhouse docs above

If the size of a MergeTree table exceeds max_table_size_to_drop (in bytes), you can’t delete it using a DROP query or TRUNCATE query.

@colin-sentry
Copy link
Member Author

Copy link

codecov bot commented Sep 17, 2024

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
1246 1 1245 2
View the top 1 failed tests by shortest run time
tests.migrations.test_runner test_no_schema_differences
Stack Traces | 7.54s run time
Traceback (most recent call last):
  File ".../tests/migrations/test_runner.py", line 478, in test_no_schema_differences
    assert (
AssertionError: Schema mismatch: spans_num_attrs_2_local does not match schema
assert ["Column 'attr_min_value' type differs between local ClickHouse and schema! "\n "(expected: schemas.SimpleAggregateFunction('min', schemas.Float(64, "\n 'modifiers=None), modifiers=None), is: FlattenedColumn(None, '\n "'attr_min_value', schemas.Float(64, modifiers=None)))",\n "Column 'attr_max_value' type differs between local ClickHouse and schema! "\n "(expected: schemas.SimpleAggregateFunction('max', schemas.Float(64, "\n 'modifiers=None), modifiers=None), is: FlattenedColumn(None, '\n "'attr_max_value', schemas.Float(64, modifiers=None)))"] == []
  Left contains 2 more items, first extra item: "Column 'attr_min_value' type differs between local ClickHouse and schema! (expected: schemas.SimpleAggregateFunction(...(64, modifiers=None), modifiers=None), is: FlattenedColumn(None, 'attr_min_value', schemas.Float(64, modifiers=None)))"
  Full diff:
    [
  -  ,
  +  "Column 'attr_min_value' type differs between local ClickHouse and schema! "
  +  "(expected: schemas.SimpleAggregateFunction('min', schemas.Float(64, "
  +  'modifiers=None), modifiers=None), is: FlattenedColumn(None, '
  +  "'attr_min_value', schemas.Float(64, modifiers=None)))",
  +  "Column 'attr_max_value' type differs between local ClickHouse and schema! "
  +  "(expected: schemas.SimpleAggregateFunction('max', schemas.Float(64, "
  +  'modifiers=None), modifiers=None), is: FlattenedColumn(None, '
  +  "'attr_max_value', schemas.Float(64, modifiers=None)))",
    ]

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

@nikhars
Copy link
Member

nikhars commented Sep 17, 2024

Opened ops request: https://getsentry.atlassian.net/servicedesk/customer/portal/18/INF-274

Thank you. Please respond on this PR when the ops request has been completed. I will remove my blocking review then.

@nikhars nikhars dismissed their stale review September 17, 2024 23:44

My concern has been resolved. Will let someone familiar with the product provide appropriate review.

@colin-sentry colin-sentry merged commit 157bd8e into master Sep 18, 2024
30 checks passed
@colin-sentry colin-sentry deleted the span_attr_table_v3 branch September 18, 2024 03:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants