"Pass-through" or "re-defined" metrics #1465
siljamardla
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Disclaimer: the phrasing and descriptions might be a bit messy, but I hope you'll see the main point :)
Feature
Update metric configurations to allow "re-defining" the same metric on top of multiple tables.
Specifically, both definitions mean exactly the same thing and are expected to result in exactly the same number.
When writing MetricFlow queries it should be possible to manually specify which version of the metric (i.e. which table) to query, but there should also be default behaviour to prefer the "better" table. Better depends on the context, it might be:
Use cases
I have two very frequent use cases in mind: using pre-aggregations and building self-service datasets
Pre-aggregations
or
Instead of trying to cache each different grain or letting people scan the underlying big fact tables, we could explicitly specify that aggregating the column in the export will result in exactly the same metric definition and value with smaller compute effort.
Building self-service datasets
I would produce the self-service dataset with a saved query like this:
The output of this would be an order_id level table, that has columns with metric1, metric2, metric3, metric4.
By definition the order level data mart has fewer rows than the upstream order event and order pricing tables.
I want my data usage to be based on metrics. So I want people to look at some metrics glossary, find a metric and query it.
As of now, they will always be directed to the underlying fact tables. I would like to send them to this pre-calculated order level table, because it already contains many useful metrics for them, in one table.
Come to think of this, it's like a special case of the pre-aggregations, except the aggregation result is still rather detailed.
Specifications
Far from being fully figured out. If we had something like this:
And we would write a saved query for an export:
And then define a "pass-through" metric on top of the data mart:
Come to think of it... the imaginary config here only contains one useful piece of information: the export name.
And the same export might have a lot of metrics inside. It would be really tedious to define all these metrics.
So turning this around, it could be as simple as specifying something extra in the saved query / export phase. Something like this:
Based on this, MetricFlow could pick up on SQL compilation phase that whenever someone would query the
count_order_rescheduled_events
metric by customer, city or any attribute of these, the data should be read from themart_customer_metrics
table instead of the underlying fact table(s).Concurrent pass-throughs
What if we have an export for customer-city grain and customer-product grain and we ask for customer grain metrics?
There would have to be some kind of a rule to decide which export to prefer.
Beta Was this translation helpful? Give feedback.
All reactions