Restructuring measurements DB #888

ArneTR · 2024-09-09T07:54:07Z

ArneTR
Sep 9, 2024
Maintainer

Currently the measurements table looks like this:

We are seeing the database reaching 100 GB per 3 months atm and it is mostly due to the measurements table.

Idea is to increase normalization on the table and:

Remove created_at / updated_at columns. The provide no inherent value as the measurements.created_at is always extremely close to the runs.created_at. Values are now designed to be never updated, so the updated_at makes no sense anymore.
create a secondary table that holds the key measurement information like run_id, detail_name, metric, unit that never change for a metric_reporter data series. Than have a new table that only includes id, measurement_id, time, value.
When retrieving data is it joined as usual to mimic the current table structure

Benefits

We would be able to drop 2 columns for created_at, update_at which would hopefully save 20%
We would be able to drop another 3 columns for detail_name, metric, unit thus saving another 30%

@ribalba RFC

ribalba · 2024-09-26T12:03:41Z

ribalba
Sep 26, 2024
Maintainer

I like the idea of dropping created and updated
With creating a new table I only see the advantage that queries will always operate on the joined table which will make them a little slower. This might be a problem.

We talked about aging of data. Would this not also solve the problem? I think this is something we need to implement anyway as the data will become huge once more people start using us.

3 replies

ArneTR Sep 27, 2024
Maintainer Author

I delete all data older than 3 months every 14 days. We still generate 100 GB atm per 3 months just in raw measurement data.

This is why I want to reduce data volume now :)

So we are constant, but it is still a lot

ribalba Sep 27, 2024
Maintainer

Ahh ok. Do you have some benchmarks or quick tests on how much overhead the join produces? In the end you are swapping storage with computational resources.

ArneTR Sep 27, 2024
Maintainer Author

Will do. Not yet

ArneTR · 2024-10-06T09:37:53Z

ArneTR
Oct 6, 2024
Maintainer Author

Alright, the profiling is in.

Table space

First of all the gains in table space

Current table design

Size

New table design

Size

Intermediate summary

What is really dragging us down in both tables in size is the unique index involving time. It cannot be used very much as the cardinality is extremely high. Given the pre-selection of metric, detail_name and unit the cardinality is identical by design. However there are so many rows in the table that the index is larger than the data itself :/
Example: If you would delete the time index on measurement_values you would save all the 11 GB

Removing the index does not slow down any operation as it anyway cannot be used. But it guarantees the uniqueness for us in case of an error.

=> My optimization proposal would be to move the uniqueness check into the GMT.

Performance

Current table design

Query:

EXPLAIN ANALYZE SELECT metric, detail_name, unit, time FROM measurements WHERE run_id = 'd9413c1d-a99e-4838-880c-85e9131a65e5';

Result:

Bitmap Heap Scan on measurements  (cost=1715.67..485104.26 rows=134577 width=61) (actual time=2350.575..2350.576 rows=0 loops=1)
  Recheck Cond: (run_id = 'd9413c1d-a99e-4838-880c-85e9131a65e5'::uuid)
  Heap Blocks: exact=2338
  ->  Bitmap Index Scan on measurements_build_and_store_phase_stats  (cost=0.00..1682.02 rows=134577 width=0) (actual time=79.333..79.334 rows=129426 loops=1)
        Index Cond: (run_id = 'd9413c1d-a99e-4838-880c-85e9131a65e5'::uuid)
Planning Time: 0.086 ms
JIT:
  Functions: 4
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 0.251 ms, Inlining 0.000 ms, Optimization 0.000 ms, Emission 0.000 ms, Total 0.251 ms
Execution Time: 2351.022 ms

New table design

Query:

EXPLAIN ANALYZE SELECT mm.detail_name, mv.time, mm.metric,
                   mv.value, mm.unit
FROM measurement_values as mv
JOIN measurement_metrics as mm on mv.measurement_metric_id = mm.id
WHERE mm.run_id = '70d181da-db12-4b36-8cf1-12d1193fdb84'
ORDER BY mm.metric ASC, mm.detail_name ASC, mv.time ASC;

Result:

Gather Merge  (cost=23146.38..34333.01 rows=97275 width=68) (actual time=104.454..106.141 rows=0 loops=1)
  Workers Planned: 1
  Workers Launched: 1
  ->  Sort  (cost=22146.37..22389.56 rows=97275 width=68) (actual time=97.106..97.107 rows=0 loops=2)
        Sort Key: mm.metric, mm.detail_name, mv."time"
        Sort Method: quicksort  Memory: 25kB
        Worker 0:  Sort Method: quicksort  Memory: 25kB
        ->  Nested Loop  (cost=0.86..14087.24 rows=97275 width=68) (actual time=97.071..97.072 rows=0 loops=2)
              ->  Parallel Index Scan using measurement_metrics_pkey on measurement_metrics mm  (cost=0.29..1040.23 rows=9 width=56) (actual time=97.070..97.070 rows=0 loops=2)
                    Filter: (run_id = '70d181da-db12-4b36-8cf1-12d1193fdb84'::uuid)
                    Rows Removed by Filter: 13804
              ->  Index Scan using idx_measurement_values_mmid on measurement_values mv  (cost=0.57..1197.55 rows=25212 width=20) (never executed)
                    Index Cond: (measurement_metric_id = mm.id)
Planning Time: 21.579 ms
Execution Time: 106.237 ms

Additional infos

At first glimpse it looks like the query is 2-6 times slower. I have seen the old design work in 20ms while the new design takes 120 ms, or seen the old design take 100 ms while the new design takes 600 ms.

These results however only stabilize after the first run.

So the profiling tool with EXPLAIN ANALYZE gives a false picture here in my opinion.

When I run these queries normally I see that the new design is almost constant in performance while the old design enormously profits from the query cache.
So the first requests takes 2-8 seconds, while subsequent ones run in the 10-100 ms range with the time apparently mostly spent in execution, not in planning.

Example:

Bitmap Heap Scan on measurements  (cost=1715.67..485104.26 rows=134577 width=61) (actual time=2350.575..2350.576 rows=0 loops=1)
  Recheck Cond: (run_id = 'd9413c1d-a99e-4838-880c-85e9131a65e5'::uuid)
  Heap Blocks: exact=2338
  ->  Bitmap Index Scan on measurements_build_and_store_phase_stats  (cost=0.00..1682.02 rows=134577 width=0) (actual time=79.333..79.334 rows=129426 loops=1)
        Index Cond: (run_id = 'd9413c1d-a99e-4838-880c-85e9131a65e5'::uuid)
Planning Time: 0.086 ms
JIT:
  Functions: 4
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 0.251 ms, Inlining 0.000 ms, Optimization 0.000 ms, Emission 0.000 ms, Total 0.251 ms
Execution Time: 2351.022 ms

The issue with that is that I belive that very rarely details on a measurement are pulled in repeated fashion. At least not so repeatedly that the query cache can be re-used.

Summary

I would vote for the new table design, dropping the unique key and also making the unique checks in the GMT.

@ribalba Would love your thoughts on this. Also please give my table design a look if that is maybe not optimal to begin with.

1 reply

ribalba Oct 14, 2024
Maintainer

Looking at the savings I am also in favour of the new design. A few things we could add:

If the query takes so much longer we need to change the interface to have a proper loading screen.
In favour of removing the unique index. We could also add a cron job we run every night to make sure that the values are still unique just as a double check. But changing it in the GMT code should be enough. This would just be to be extra safe.
The sort seems to be very expensive. Maybe we could do this in the frontend? Once we have the objects in memory
Maybe we could speed up the join by an index?
Otherwise looks like a good way to go

ArneTR · 2024-10-11T12:51:19Z

ArneTR
Oct 11, 2024
Maintainer Author

@ribalba Needing Feedback

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructuring measurements DB #888

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Restructuring measurements DB #888

ArneTR Sep 9, 2024 Maintainer

Benefits

Replies: 3 comments · 4 replies

ribalba Sep 26, 2024 Maintainer

ArneTR Sep 27, 2024 Maintainer Author

ribalba Sep 27, 2024 Maintainer

ArneTR Sep 27, 2024 Maintainer Author

ArneTR Oct 6, 2024 Maintainer Author

Table space

Current table design

Size

New table design

Size

Intermediate summary

Performance

Current table design

New table design

Additional infos

Summary

ribalba Oct 14, 2024 Maintainer

ArneTR Oct 11, 2024 Maintainer Author

ArneTR
Sep 9, 2024
Maintainer

Replies: 3 comments 4 replies

ribalba
Sep 26, 2024
Maintainer

ArneTR Sep 27, 2024
Maintainer Author

ribalba Sep 27, 2024
Maintainer

ArneTR Sep 27, 2024
Maintainer Author

ArneTR
Oct 6, 2024
Maintainer Author

ribalba Oct 14, 2024
Maintainer

ArneTR
Oct 11, 2024
Maintainer Author