Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out count distincts and other distinct aggs too - when mixed with other aggs - into a seperate step and join #24233

Open
kaikalur opened this issue Dec 10, 2024 · 2 comments

Comments

@kaikalur
Copy link
Contributor

kaikalur commented Dec 10, 2024

Sometimes count distinct can make the plan too complex. So it will be good to split out the count distinct into a separate subplan and join. Roughly in SQL:

SELECT COUNT(DISTINCT x) v1, COUNT(DISTINCT y) AS v2, approx_set(x) as s1, Approx_set(y) s2 FROM T group by k

can be done as

SELECT * FROM
(SELECT k, COUNT(DISTINCT x) v1, COUNT(DISTINCT y) AS v2 FROM T group by k)
JOIN
(SELECT k, approx_set(x) as s1, Approx_set(y) s2 FROM T group by k)
USING(k)

(Need to take care of nulls - maybe by coalescing or something)

@kaikalur
Copy link
Contributor Author

Or one can do full join/union and do another arbitrary(v) v on all the values for more robustness and ease of handling nulls

SELECT k, arbitrary(v1) v1, arbitrary(v2) v2, arbitrary(s1) s1, arbitrary(s2) s2, FROM
(
SELECT k, COUNT(DISTINCT x) v1, COUNT(DISTINCT y) AS v2, null s1, null s2 FROM T group by k
union all
SELECT k, null v1, null v2, approx_set(x) as s1, Approx_set(y) s2 FROM T group by k
)
group by k

@kaikalur kaikalur changed the title Split out distincts and other distinct aggs too - when mixed with other aggs - into a seperate step and join Split out count distincts and other distinct aggs too - when mixed with other aggs - into a seperate step and join Dec 10, 2024
@kaikalur
Copy link
Contributor Author

Or even separate each distinct into its own subselect and add - depending on the number of distinct values being agg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

1 participant