Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental support for delta temporality #121

Merged
merged 7 commits into from
Jul 13, 2023
Merged

Conversation

srikanthccv
Copy link
Member

@srikanthccv srikanthccv commented Mar 25, 2023

  • Add a new column temporality with Set(3) index (Unspecified, Delta, Cumulative); add the __temporality__ label since we want different fingerprints for different temporality.
  • Create a new table with updated order by
  • Make a map of metric to temporality to use when writing to DB.
  • Update the span processor to set monotonic=true; otherwise Prometheus(receiver) converts them to gauges.

@srikanthccv
Copy link
Member Author

@ankitnayan, the modified query for the rate with detla. I will be doing more testing, but I believe we don't need a fingerprint in the group by clause.

SELECT
    service_name,
    toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
    sum(value) / 60 AS value
FROM signoz_metrics.distributed_samples_v2
GLOBAL INNER JOIN
(
    SELECT
        JSONExtractString(labels, 'service_name') AS service_name,
        fingerprint
    FROM signoz_metrics.distributed_time_series_v2
    WHERE (metric_name = 'signoz_calls_total') AND (temporality = 'Delta') AND (timestamp_ms >= (toUnixTimestamp(now() - toIntervalMinute(30)) * 1000)) AND (timestamp_ms <= (toUnixTimestamp(now()) * 1000))
) AS filtered_time_series USING (fingerprint)
WHERE (metric_name = 'signoz_calls_total')
GROUP BY
    service_name,
    ts
ORDER BY
    service_name ASC,
    ts ASC
Output of the query (same as final result of cumulative query/traces)
┌─service_name─┬──────────────────ts─┬─value─┐
│ customer     │ 2023-07-04 10:29:00 │     1 │
│ customer     │ 2023-07-04 10:30:00 │  0.87 │
│ customer     │ 2023-07-04 10:31:00 │   0.9 │
│ customer     │ 2023-07-04 10:32:00 │  0.93 │
│ customer     │ 2023-07-04 10:33:00 │  0.93 │
│ customer     │ 2023-07-04 10:34:00 │   0.9 │
│ customer     │ 2023-07-04 10:35:00 │   0.9 │
│ customer     │ 2023-07-04 10:36:00 │  0.95 │
│ customer     │ 2023-07-04 10:37:00 │  0.93 │
│ customer     │ 2023-07-04 10:38:00 │  0.93 │
│ customer     │ 2023-07-04 10:39:00 │  0.88 │
│ customer     │ 2023-07-04 10:40:00 │  0.92 │
│ customer     │ 2023-07-04 10:41:00 │  0.97 │
│ customer     │ 2023-07-04 10:42:00 │  0.92 │
│ customer     │ 2023-07-04 10:43:00 │  0.95 │
│ customer     │ 2023-07-04 10:44:00 │  0.93 │
│ customer     │ 2023-07-04 10:45:00 │  0.95 │
│ customer     │ 2023-07-04 10:46:00 │  0.92 │
│ customer     │ 2023-07-04 10:47:00 │     1 │
│ customer     │ 2023-07-04 10:48:00 │  0.88 │
│ customer     │ 2023-07-04 10:49:00 │  0.97 │
│ customer     │ 2023-07-04 10:50:00 │  0.97 │
│ customer     │ 2023-07-04 10:51:00 │  0.92 │
│ customer     │ 2023-07-04 10:52:00 │  0.95 │
│ customer     │ 2023-07-04 10:53:00 │  0.95 │
│ customer     │ 2023-07-04 10:54:00 │  0.93 │
│ customer     │ 2023-07-04 10:55:00 │  0.93 │
│ customer     │ 2023-07-04 10:56:00 │  0.92 │
│ customer     │ 2023-07-04 10:57:00 │   0.9 │
│ driver       │ 2023-07-04 10:29:00 │  1.02 │
│ driver       │ 2023-07-04 10:30:00 │  0.87 │
│ driver       │ 2023-07-04 10:31:00 │   0.9 │
│ driver       │ 2023-07-04 10:32:00 │  0.93 │
│ driver       │ 2023-07-04 10:33:00 │  0.93 │
│ driver       │ 2023-07-04 10:34:00 │   0.9 │
│ driver       │ 2023-07-04 10:35:00 │   0.9 │
│ driver       │ 2023-07-04 10:36:00 │  0.95 │
│ driver       │ 2023-07-04 10:37:00 │  0.93 │
│ driver       │ 2023-07-04 10:38:00 │  0.92 │
│ driver       │ 2023-07-04 10:39:00 │   0.9 │
│ driver       │ 2023-07-04 10:40:00 │  0.92 │
│ driver       │ 2023-07-04 10:41:00 │  0.97 │
│ driver       │ 2023-07-04 10:42:00 │  0.92 │
│ driver       │ 2023-07-04 10:43:00 │  0.95 │
│ driver       │ 2023-07-04 10:44:00 │  0.92 │
│ driver       │ 2023-07-04 10:45:00 │  0.97 │
│ driver       │ 2023-07-04 10:46:00 │  0.92 │
│ driver       │ 2023-07-04 10:47:00 │     1 │
│ driver       │ 2023-07-04 10:48:00 │  0.88 │
│ driver       │ 2023-07-04 10:49:00 │  0.97 │
│ driver       │ 2023-07-04 10:50:00 │  0.97 │
│ driver       │ 2023-07-04 10:51:00 │  0.92 │
│ driver       │ 2023-07-04 10:52:00 │  0.95 │
│ driver       │ 2023-07-04 10:53:00 │  0.95 │
│ driver       │ 2023-07-04 10:54:00 │  0.93 │
│ driver       │ 2023-07-04 10:55:00 │  0.93 │
│ driver       │ 2023-07-04 10:56:00 │  0.92 │
│ driver       │ 2023-07-04 10:57:00 │   0.9 │
│ frontend     │ 2023-07-04 10:29:00 │ 24.37 │
│ frontend     │ 2023-07-04 10:30:00 │  20.8 │
│ frontend     │ 2023-07-04 10:31:00 │  21.6 │
│ frontend     │ 2023-07-04 10:32:00 │  22.4 │
│ frontend     │ 2023-07-04 10:33:00 │  22.4 │
│ frontend     │ 2023-07-04 10:34:00 │  21.6 │
│ frontend     │ 2023-07-04 10:35:00 │ 21.48 │
│ frontend     │ 2023-07-04 10:36:00 │ 22.92 │
│ frontend     │ 2023-07-04 10:37:00 │  22.4 │
│ frontend     │ 2023-07-04 10:38:00 │ 22.03 │
│ frontend     │ 2023-07-04 10:39:00 │ 21.57 │
│ frontend     │ 2023-07-04 10:40:00 │    22 │
│ frontend     │ 2023-07-04 10:41:00 │ 23.08 │
│ frontend     │ 2023-07-04 10:42:00 │ 22.12 │
│ frontend     │ 2023-07-04 10:43:00 │  22.8 │
│ frontend     │ 2023-07-04 10:44:00 │ 22.03 │
│ frontend     │ 2023-07-04 10:45:00 │ 23.17 │
│ frontend     │ 2023-07-04 10:46:00 │    22 │
│ frontend     │ 2023-07-04 10:47:00 │ 23.85 │
│ frontend     │ 2023-07-04 10:48:00 │ 21.35 │
│ frontend     │ 2023-07-04 10:49:00 │  23.2 │
│ frontend     │ 2023-07-04 10:50:00 │ 23.12 │
│ frontend     │ 2023-07-04 10:51:00 │ 21.93 │
│ frontend     │ 2023-07-04 10:52:00 │ 22.95 │
│ frontend     │ 2023-07-04 10:53:00 │  22.8 │
│ frontend     │ 2023-07-04 10:54:00 │  22.4 │
│ frontend     │ 2023-07-04 10:55:00 │ 22.15 │
│ frontend     │ 2023-07-04 10:56:00 │ 22.25 │
│ frontend     │ 2023-07-04 10:57:00 │ 21.25 │
│ mysql        │ 2023-07-04 10:29:00 │     1 │
│ mysql        │ 2023-07-04 10:30:00 │  0.87 │
│ mysql        │ 2023-07-04 10:31:00 │   0.9 │
│ mysql        │ 2023-07-04 10:32:00 │  0.93 │
│ mysql        │ 2023-07-04 10:33:00 │  0.95 │
│ mysql        │ 2023-07-04 10:34:00 │  0.88 │
│ mysql        │ 2023-07-04 10:35:00 │   0.9 │
│ mysql        │ 2023-07-04 10:36:00 │  0.95 │
│ mysql        │ 2023-07-04 10:37:00 │  0.93 │
│ mysql        │ 2023-07-04 10:38:00 │  0.93 │
│ mysql        │ 2023-07-04 10:39:00 │  0.88 │
│ mysql        │ 2023-07-04 10:40:00 │  0.92 │
│ mysql        │ 2023-07-04 10:41:00 │  0.97 │
│ mysql        │ 2023-07-04 10:42:00 │  0.92 │
│ mysql        │ 2023-07-04 10:43:00 │  0.95 │
│ mysql        │ 2023-07-04 10:44:00 │  0.93 │
│ mysql        │ 2023-07-04 10:45:00 │  0.95 │
│ mysql        │ 2023-07-04 10:46:00 │  0.92 │
│ mysql        │ 2023-07-04 10:47:00 │     1 │
│ mysql        │ 2023-07-04 10:48:00 │  0.88 │
│ mysql        │ 2023-07-04 10:49:00 │  0.97 │
│ mysql        │ 2023-07-04 10:50:00 │  0.97 │
│ mysql        │ 2023-07-04 10:51:00 │  0.92 │
│ mysql        │ 2023-07-04 10:52:00 │  0.95 │
│ mysql        │ 2023-07-04 10:53:00 │  0.97 │
│ mysql        │ 2023-07-04 10:54:00 │  0.92 │
│ mysql        │ 2023-07-04 10:55:00 │  0.95 │
│ mysql        │ 2023-07-04 10:56:00 │   0.9 │
│ mysql        │ 2023-07-04 10:57:00 │  0.92 │
│ redis        │ 2023-07-04 10:29:00 │  13.5 │
│ redis        │ 2023-07-04 10:30:00 │  11.7 │
│ redis        │ 2023-07-04 10:31:00 │ 12.15 │
│ redis        │ 2023-07-04 10:32:00 │  12.6 │
│ redis        │ 2023-07-04 10:33:00 │  12.7 │
│ redis        │ 2023-07-04 10:34:00 │ 12.05 │
│ redis        │ 2023-07-04 10:35:00 │ 12.15 │
│ redis        │ 2023-07-04 10:36:00 │ 12.83 │
│ redis        │ 2023-07-04 10:37:00 │  12.6 │
│ redis        │ 2023-07-04 10:38:00 │  12.6 │
│ redis        │ 2023-07-04 10:39:00 │ 11.92 │
│ redis        │ 2023-07-04 10:40:00 │ 12.38 │
│ redis        │ 2023-07-04 10:41:00 │ 13.05 │
│ redis        │ 2023-07-04 10:42:00 │ 12.37 │
│ redis        │ 2023-07-04 10:43:00 │ 12.83 │
│ redis        │ 2023-07-04 10:44:00 │  12.5 │
│ redis        │ 2023-07-04 10:45:00 │ 12.92 │
│ redis        │ 2023-07-04 10:46:00 │ 12.38 │
│ redis        │ 2023-07-04 10:47:00 │  13.5 │
│ redis        │ 2023-07-04 10:48:00 │ 11.92 │
│ redis        │ 2023-07-04 10:49:00 │ 13.05 │
│ redis        │ 2023-07-04 10:50:00 │ 13.05 │
│ redis        │ 2023-07-04 10:51:00 │ 12.38 │
│ redis        │ 2023-07-04 10:52:00 │ 12.82 │
│ redis        │ 2023-07-04 10:53:00 │ 12.88 │
│ redis        │ 2023-07-04 10:54:00 │ 12.55 │
│ redis        │ 2023-07-04 10:55:00 │ 12.65 │
│ redis        │ 2023-07-04 10:56:00 │ 12.32 │
│ redis        │ 2023-07-04 10:57:00 │ 12.18 │
│ route        │ 2023-07-04 10:29:00 │ 10.17 │
│ route        │ 2023-07-04 10:30:00 │  8.67 │
│ route        │ 2023-07-04 10:31:00 │     9 │
│ route        │ 2023-07-04 10:32:00 │  9.33 │
│ route        │ 2023-07-04 10:33:00 │  9.33 │
│ route        │ 2023-07-04 10:34:00 │     9 │
│ route        │ 2023-07-04 10:35:00 │  8.95 │
│ route        │ 2023-07-04 10:36:00 │  9.55 │
│ route        │ 2023-07-04 10:37:00 │  9.33 │
│ route        │ 2023-07-04 10:38:00 │  9.17 │
│ route        │ 2023-07-04 10:39:00 │     9 │
│ route        │ 2023-07-04 10:40:00 │  9.17 │
│ route        │ 2023-07-04 10:41:00 │  9.62 │
│ route        │ 2023-07-04 10:42:00 │  9.22 │
│ route        │ 2023-07-04 10:43:00 │   9.5 │
│ route        │ 2023-07-04 10:44:00 │  9.17 │
│ route        │ 2023-07-04 10:45:00 │  9.67 │
│ route        │ 2023-07-04 10:46:00 │  9.17 │
│ route        │ 2023-07-04 10:47:00 │  9.93 │
│ route        │ 2023-07-04 10:48:00 │   8.9 │
│ route        │ 2023-07-04 10:49:00 │  9.67 │
│ route        │ 2023-07-04 10:50:00 │  9.63 │
│ route        │ 2023-07-04 10:51:00 │  9.13 │
│ route        │ 2023-07-04 10:52:00 │  9.57 │
│ route        │ 2023-07-04 10:53:00 │   9.5 │
│ route        │ 2023-07-04 10:54:00 │  9.33 │
│ route        │ 2023-07-04 10:55:00 │  9.22 │
│ route        │ 2023-07-04 10:56:00 │  9.28 │
│ route        │ 2023-07-04 10:57:00 │  8.83 │
└──────────────┴─────────────────────┴───────┘

@ankitnayan
Copy link
Contributor

@srikanthccv what would a percentile query look like?

@srikanthccv
Copy link
Member Author

SELECT
    service_name,
    ts,
    histogramQuantile(arrayMap(x -> toFloat64(x), groupArray(le)), groupArray(value), 0.95) AS value
FROM
(
    SELECT
        service_name,
        le,
        toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
        sum(value) / 60 AS value
    FROM signoz_metrics.distributed_samples_v2
    GLOBAL INNER JOIN
    (
        SELECT
            JSONExtractString(labels, 'service_name') AS service_name,
            JSONExtractString(labels, 'le') AS le,
            fingerprint
        FROM signoz_metrics.distributed_time_series_v2
        WHERE (metric_name = 'signoz_latency_bucket') AND (temporality = 'Delta')
    ) AS filtered_time_series USING (fingerprint)
    WHERE metric_name = 'signoz_latency_bucket'
    GROUP BY
        service_name,
        le,
        ts
    ORDER BY
        service_name ASC,
        le ASC,
        ts ASC
)
GROUP BY
    service_name,
    ts
ORDER BY
    service_name ASC,
    ts ASC

@srikanthccv
Copy link
Member Author

srikanthccv commented Jul 6, 2023

1 Shard, 4 vCPUs, and 16 GB. Here is how the delta compares to the cumulative with the devrev data. The cumulative values are taken from an earlier exercise here https://github.com/SigNoz/engineering-pod/issues/903#issuecomment-1597998417

p90 Cumulative Delta %
(1 service) 1 day - 5 min 24.139 sec 16.774 sec ~30 %
(1 service) 7 days - 1 hr 45.870 sec 43.238 sec negligible
(all services) 1 day - 5 min 96.607 sec 40.875 sec ~55%
(all services) 7 days - 1 hr 152.261 sec 96.222 sec ~36 %

Queries for delta:

(1 service) 1 day - 5 min

query
SELECT
    destination_service,
    ts,
    histogramQuantile(arrayMap(x -> toFloat64(x), groupArray(le)), groupArray(value), 0.9) AS value
FROM
(
    SELECT
        destination_service,
        le,
        toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(300)) AS ts,
        sum(value) / 300 AS value
    FROM signoz_metrics.distributed_samples_v2
    INNER JOIN
    (
        SELECT
            JSONExtractString(labels, 'destination_service') AS destination_service,
            JSONExtractString(labels, 'le') AS le,
            fingerprint
        FROM signoz_metrics.time_series_v2
        WHERE (metric_name = 'istio_request_bytes_bucket') AND (labels LIKE '%gateway%') AND (JSONExtractString(labels, 'destination_service') = 'gateway.gateway.svc.cluster.local')
    ) AS filtered_time_series USING (fingerprint)
    WHERE (metric_name = 'istio_request_bytes_bucket') AND (timestamp_ms >= 1679500800000) AND (timestamp_ms <= 1679587200000)
    GROUP BY
        destination_service,
        le,
        ts
    ORDER BY
        destination_service ASC,
        le ASC,
        ts ASC
)
GROUP BY
    destination_service,
    ts
ORDER BY
    destination_service ASC,
    ts ASC

288 rows in set. Elapsed: 16.774 sec. Processed 189.05 million rows, 13.10 GB (11.27 million rows/s., 780.87 MB/s.)

(1 service) 7 days - 1 hr

query
SELECT
    destination_service,
    ts,
    histogramQuantile(arrayMap(x -> toFloat64(x), groupArray(le)), groupArray(value), 0.9) AS value
FROM
(
    SELECT
        destination_service,
        le,
        toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(3600)) AS ts,
        sum(value) / 3600 AS value
    FROM signoz_metrics.distributed_samples_v2
    INNER JOIN
    (
        SELECT
            JSONExtractString(labels, 'destination_service') AS destination_service,
            JSONExtractString(labels, 'le') AS le,
            fingerprint
        FROM signoz_metrics.time_series_v2
        WHERE (metric_name = 'istio_request_bytes_bucket') AND (labels LIKE '%gateway%') AND (JSONExtractString(labels, 'destination_service') = 'gateway.gateway.svc.cluster.local')
    ) AS filtered_time_series USING (fingerprint)
    WHERE (metric_name = 'istio_request_bytes_bucket') AND (timestamp_ms >= 1679500800000) AND (timestamp_ms <= 1680105600000)
    GROUP BY
        destination_service,
        le,
        ts
    ORDER BY
        destination_service ASC,
        le ASC,
        ts ASC
)
GROUP BY
    destination_service,
    ts
ORDER BY
    destination_service ASC,
    ts ASC

137 rows in set. Elapsed: 43.238 sec. Processed 744.89 million rows, 27.15 GB (17.23 million rows/s., 627.90 MB/s.)

(all services) 1 day - 5 min

query
SELECT
    destination_service,
    ts,
    histogramQuantile(arrayMap(x -> toFloat64(x), groupArray(le)), groupArray(value), 0.9) AS value
FROM
(
    SELECT
        destination_service,
        le,
        toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(300)) AS ts,
        sum(value) / 300 AS value
    FROM signoz_metrics.distributed_samples_v2
    INNER JOIN
    (
        SELECT
            JSONExtractString(labels, 'destination_service') AS destination_service,
            JSONExtractString(labels, 'le') AS le,
            fingerprint
        FROM signoz_metrics.time_series_v2
        WHERE (metric_name = 'istio_request_bytes_bucket')
    ) AS filtered_time_series USING (fingerprint)
    WHERE (metric_name = 'istio_request_bytes_bucket') AND (timestamp_ms >= 1679500800000) AND (timestamp_ms <= 1679587200000)
    GROUP BY
        destination_service,
        le,
        ts
    ORDER BY
        destination_service ASC,
        le ASC,
        ts ASC
)
GROUP BY
    destination_service,
    ts
ORDER BY
    destination_service ASC,
    ts ASC

17414 rows in set. Elapsed: 40.875 sec. Processed 189.05 million rows, 13.10 GB (4.62 million rows/s., 320.54 MB/s.)

(all services) 7 days - 1 hr

query
SELECT
    destination_service,
    ts,
    histogramQuantile(arrayMap(x -> toFloat64(x), groupArray(le)), groupArray(value), 0.9) AS value
FROM
(
    SELECT
        destination_service,
        le,
        toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(3600)) AS ts,
        sum(value)/3600 AS value
    FROM signoz_metrics.distributed_samples_v2
    INNER JOIN
    (
        SELECT
            JSONExtractString(labels, 'destination_service') AS destination_service,
            JSONExtractString(labels, 'le') AS le,
            fingerprint
        FROM signoz_metrics.time_series_v2
        WHERE metric_name = 'istio_request_bytes_bucket'
    ) AS filtered_time_series USING (fingerprint)
    WHERE (metric_name = 'istio_request_bytes_bucket') AND (timestamp_ms >= 1679500800000) AND (timestamp_ms <= 1680105600000)
    GROUP BY
        destination_service,
        le,
        ts
    ORDER BY
        destination_service ASC,
        le ASC,
        ts ASC
)
GROUP BY
    destination_service,
    ts
ORDER BY
    destination_service ASC,
    ts ASC

9417 rows in set. Elapsed: 96.222 sec. Processed 744.89 million rows, 27.15 GB (7.74 million rows/s., 282.19 MB/s.)

It's not clear what's happening with the 2nd query but we can notice the difference otherwise.

@ankitnayan
Copy link
Contributor

I expected more perf improvement. Maybe histogramQuantile is yet the bottleneck?

Can you try for other common queries and measure performance? Eg, RPS or avg duration with service_name as group_by and RPS, avg duration without any group_by on the custom field?

@srikanthccv
Copy link
Member Author

srikanthccv commented Jul 7, 2023

histogramQuantile is almost never the bottleneck. You can take the inner query and use FORMAT Null to see what time it takes. How much were you expecting?

Can you try for other common queries and measure performance? Eg, RPS or avg duration with service_name as group_by and RPS, avg duration without any group_by on the custom field?

I think there should be at least service_name in the group by. Here is RPS table

RPS Cumulative Delta
(1 service) 1 day - 5 min 2.754 sec 1.365 sec
(1 service) 7 days - 1 hr 3.521 sec 2.798 sec
(all services) 1 day - 5 min 9.280 sec 1.890 sec
(all services) 7 days - 1 hr 6.295 sec 4.455 sec

@srikanthccv
Copy link
Member Author

@ankitnayan please review

@srikanthccv srikanthccv merged commit 8d74730 into main Jul 13, 2023
3 checks passed
@srikanthccv srikanthccv deleted the delta-temp branch July 13, 2023 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants