-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Iceberg] Add Histogram Statistic Support #22365
[Iceberg] Add Histogram Statistic Support #22365
Conversation
58cfe30
to
e1a2caa
Compare
Consider this suggested change for the release note entry:
|
bac6b3f
to
0e3a87c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the docs! A few nits suggested for clarity and style. Let me know what you think.
Codenotify: Notifying subscribers in CODENOTIFY files for diff 901da6d...758f891. No notifications. |
3d551e9
to
65b6fe8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local docs build, everything looks good. Thanks!
65b6fe8
to
0d6389a
Compare
0d6389a
to
24398a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a rough glance, and bring some questions for discussion. Will take a detailed look in the next one or two days.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergHiveMetadata.java
Outdated
Show resolved
Hide resolved
24398a5
to
56b79b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updated doc! Two nits about formatting, no issues with the content in any of the three doc files.
56b79b0
to
492ab5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Thanks for the quick fix! Pull updated branch, new local doc build, looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. The changes look good to me, only some nits.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/statistics/KllHistogram.java
Outdated
Show resolved
Hide resolved
...o-spi/src/test/java/com/facebook/presto/spi/statistics/TestDisjointRangeDomainHistogram.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/statistics/KllHistogram.java
Outdated
Show resolved
Hide resolved
56b02fd
to
d041947
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a short first pass
|
||
The set of statistics available for a particular query depends on the connector | ||
being used and can also vary by table or even by table layout. For example, the | ||
Hive connector does not currently provide statistics on data size. | ||
|
||
Table statistics can be displayed via the Presto SQL interface using the | ||
Table statistics can be displayed using a SQL statement with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can be fetched using the below SQL query
sounds better to me
presto-spi/src/main/java/com/facebook/presto/spi/statistics/HistogramCalculator.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/statistics/HistogramCalculator.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/Utils.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/Utils.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/predicate/Marker.java
Show resolved
Hide resolved
d041947
to
eae161e
Compare
presto-common/src/main/java/com/facebook/presto/common/Utils.java
Outdated
Show resolved
Hide resolved
eae161e
to
ee5e3fd
Compare
presto-main/src/main/java/com/facebook/presto/sql/planner/StatisticsAggregationPlanner.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/statistics/TestKllHistogram.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/statistics/KllHistogram.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/statistics/TestKllHistogram.java
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/Utils.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/Utils.java
Outdated
Show resolved
Hide resolved
4b9cf64
to
dfe82f6
Compare
presto-common/src/main/java/com/facebook/presto/common/Utils.java
Outdated
Show resolved
Hide resolved
16b4c7f
to
4d3b2e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the whole work, the change looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some whitespace nits
@@ -605,7 +602,7 @@ | |||
<dependency> | |||
<groupId>org.apache.iceberg</groupId> | |||
<artifactId>iceberg-core</artifactId> | |||
<version>1.5.0</version> | |||
<version>${dep.iceberg.version}</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
presto-main/src/main/java/com/facebook/presto/sql/planner/LogicalPlanner.java
Outdated
Show resolved
Hide resolved
4d3b2e9
to
0339ee0
Compare
The set of ColumnStatisticsMetadata defined by the Hive and non-Hive connectors are not equivalent. However, it is possible to collect the superset of the relevant metadata and use it for ANALYZE. The returned statistics just need to be filtered out to contain only the relevant column statistics. This may include duplicate calculations for some statistics. For example, with distinct values Iceberg puffin files can store the result of sketch_theta for distinct values, but the code path for storing the statistic in the HMS requires a direct value from approx_distinct. Thus, ANALYZE may compute a value twice.
Utilizes the sketch_kll function to generate histograms and store them into the Iceberg table's puffin files for table-level statistic storage. Histograms are always collected by ANALYZE, but they are not used by the cost calculator unless enabled via optimizer.use-histograms
0339ee0
to
758f891
Compare
Description
This changes adds support for the histogram statistic type to Iceberg. There are 3 related commits
Motivation and Context
Histograms result in better accuracy for output row counts from filter nodes when calculating statistics. This should result in closer estimates in the CBO, resulting in better query plans.
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.