-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Render date literals as strings in time range filters #724
Conversation
MetricFlow's rendering has historically cast date literals to the underlying engine type in date range comparison filters, so if a user passed in a start_time of 2021-01-01 and an end_time of 2021-01-31 we would render something like: `WHERE ds BETWEEN CAST('2021-01-01' AS TIMESTAMP) AND CAST('2021-01-31' AS TIMESTAMP)` This turns out to cause problems with time partition columns, because they typically only apply pruning to literal predicates - any value processed by a CAST function prevents pruning from kicking in, as the engine takes the appropriately conservative approach of doing full row-by-row comparisons between the cast output and the partition value. In researching this issue two things became clear: 1. We need a holistic answer to ensuring engines apply partition pruning, one which is beyond the scope of this change. 2. We can solve part of the problem by removing the CAST call in time range filters based on literal value inputs. It turns out every engine we support does implicit type coercion between string values and date/time types. This means we don't have to care what the appropriate time type column might be in this case, because the standard behavior is to simply do the comparison correctly regardless of underlying partition column type. This change simple removes the CAST calls from this particular rendering block, which should enable users to at least run queries that will do some level of pruning, even if the solution is incomplete with respect to the types of time partition columns it can address. Note that we assume all engines accept and convert from ISO-standard date strings, but this is not a change from the prior behavior, which assumed the same thing but wrapped it in a CAST function call.
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I recall, this was put in due to an issue with a previously supported engine.
That's what I remember too but I can't find any evidence of it. It's possible the offending engine has updated their type handling. It's also possible we didn't understand BigQuery's coercion logic - their documentation on this is hidden well enough that it appears BigQuery simply doesn't support it, but they do (and they have to because of the way their partition pruning works). |
MetricFlow's rendering has historically cast date literals to
the underlying engine type in date range comparison filters, so
if a user passed in a start_time of 2021-01-01 and an end_time
of 2021-01-31 we would render something like:
WHERE ds BETWEEN CAST('2021-01-01' AS TIMESTAMP) AND CAST('2021-01-31' AS TIMESTAMP)
This turns out to cause problems with time partition columns, because
they typically only apply pruning to literal predicates - any value
processed by a CAST function prevents pruning from kicking in, as the
engine takes the appropriately conservative approach of doing full
row-by-row comparisons between the cast output and the partition value.
In researching this issue two things became clear:
pruning, one which is beyond the scope of this change.
in time range filters based on literal value inputs.
It turns out every engine we support does implicit type coercion
between string values and date/time types. This means we don't have
to care what the appropriate time type column might be in this case,
because the standard behavior is to simply do the comparison correctly
regardless of underlying partition column type.
This change simple removes the CAST calls from this particular rendering
block, which should enable users to at least run queries that will do
some level of pruning, even if the solution is incomplete with respect
to the types of time partition columns it can address.
Note that we assume all engines accept and convert from ISO-standard
date strings, but this is not a change from the prior behavior, which
assumed the same thing but wrapped it in a CAST function call.