[SPARK] Optimize : SELECT COUNT(*) FROM Table WHERE partitition=1 #3345

7mming7 · 2024-07-09T03:02:52Z

Which Delta project/connector is this regarding?

Description

Running the query "SELECT COUNT(*), MIN(X), MAX(X) FROM table WHERE partition_column = 1" takes a lot of time for big tables, Spark scan all the parquet files just to return the number of rows and min max values, that information is available from Delta Logs
Resolves #1916

How was this patch tested?

Created unit tests to validate the optimization works

Does this PR introduce any user-facing changes?

Only performance improvement

…ion_column = 1

7mming7 · 2024-07-10T03:15:50Z

cc @felipepessoto Can you help with review or guidance?

felipepessoto · 2024-07-19T19:57:21Z

@7mming7 I'm travelling on vacation, it will take some time to review it. Anyway, you will need an approval from one of the maintainers.

7mming7 · 2024-07-22T03:19:44Z

@felipepessoto I see. Have a nice vacation.

7mming7 added 2 commits July 9, 2024 16:58

[SPARK] Optimize common case: SELECT COUNT(*) FROM Table WHERE partit…

34a8b8c

…ion_column = 1

fix

545baff

7mming7 force-pushed the dl-1916 branch from 56ec529 to 545baff Compare July 9, 2024 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK] Optimize : SELECT COUNT(*) FROM Table WHERE partitition=1 #3345

[SPARK] Optimize : SELECT COUNT(*) FROM Table WHERE partitition=1 #3345

7mming7 commented Jul 9, 2024 •

edited

Loading

7mming7 commented Jul 10, 2024

felipepessoto commented Jul 19, 2024

7mming7 commented Jul 22, 2024

[SPARK] Optimize : SELECT COUNT(*) FROM Table WHERE partitition=1 #3345

Are you sure you want to change the base?

[SPARK] Optimize : SELECT COUNT(*) FROM Table WHERE partitition=1 #3345

Conversation

7mming7 commented Jul 9, 2024 • edited Loading

Which Delta project/connector is this regarding?

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

7mming7 commented Jul 10, 2024

felipepessoto commented Jul 19, 2024

7mming7 commented Jul 22, 2024

7mming7 commented Jul 9, 2024 •

edited

Loading