Replies: 3 comments 4 replies
-
We talked about aging of data. Would this not also solve the problem? I think this is something we need to implement anyway as the data will become huge once more people start using us. |
Beta Was this translation helpful? Give feedback.
-
Alright, the profiling is in. Table spaceFirst of all the gains in table space Current table designSizeNew table designSizeIntermediate summaryWhat is really dragging us down in both tables in size is the unique index involving time. It cannot be used very much as the cardinality is extremely high. Given the pre-selection of metric, detail_name and unit the cardinality is identical by design. However there are so many rows in the table that the index is larger than the data itself :/ Removing the index does not slow down any operation as it anyway cannot be used. But it guarantees the uniqueness for us in case of an error. => My optimization proposal would be to move the uniqueness check into the GMT. PerformanceCurrent table designQuery: EXPLAIN ANALYZE SELECT metric, detail_name, unit, time FROM measurements WHERE run_id = 'd9413c1d-a99e-4838-880c-85e9131a65e5'; Result:
New table designQuery: EXPLAIN ANALYZE SELECT mm.detail_name, mv.time, mm.metric,
mv.value, mm.unit
FROM measurement_values as mv
JOIN measurement_metrics as mm on mv.measurement_metric_id = mm.id
WHERE mm.run_id = '70d181da-db12-4b36-8cf1-12d1193fdb84'
ORDER BY mm.metric ASC, mm.detail_name ASC, mv.time ASC; Result:
Additional infosAt first glimpse it looks like the query is 2-6 times slower. I have seen the old design work in 20ms while the new design takes 120 ms, or seen the old design take 100 ms while the new design takes 600 ms. These results however only stabilize after the first run. So the profiling tool with When I run these queries normally I see that the new design is almost constant in performance while the old design enormously profits from the Example:
The issue with that is that I belive that very rarely details on a measurement are pulled in repeated fashion. At least not so repeatedly that the query cache can be re-used. SummaryI would vote for the new table design, dropping the unique key and also making the unique checks in the GMT. @ribalba Would love your thoughts on this. Also please give my table design a look if that is maybe not optimal to begin with. |
Beta Was this translation helpful? Give feedback.
-
@ribalba Needing Feedback |
Beta Was this translation helpful? Give feedback.
-
Currently the measurements table looks like this:
We are seeing the database reaching 100 GB per 3 months atm and it is mostly due to the
measurements
table.Idea is to increase normalization on the table and:
created_at
/updated_at
columns. The provide no inherent value as themeasurements.created_at
is always extremely close to theruns.created_at
. Values are now designed to be never updated, so theupdated_at
makes no sense anymore.Benefits
created_at
,update_at
which would hopefully save 20%detail_name
,metric
,unit
thus saving another 30%@ribalba RFC
Beta Was this translation helpful? Give feedback.
All reactions