-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[iceberg] Add UUID type support #23627
Conversation
49f7ab3
to
5343fb6
Compare
Nit suggestion for the release note entry to follow the Order of changes in the Release Notes Guidelines:
|
Fixed, thanks Steve! |
5343fb6
to
70b6f07
Compare
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
@Override | ||
public int hashCode() | ||
{ | ||
return UUID.hashCode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is a singleton? If so the equals method can be simpler. Otherwise this hashCode method is incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeInfo
is a hive-specific class that I am implementing because it does not natively support UUIDs. This implementation follows the same primitive paradigm as the other PrimitiveTypeInfo
which exist in the Apache Hive's implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this right the hashCode is a constant, which only makes sense if the object is a singleton. Am I missing something? (always possible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just matching the same implementation used by the rest of the TypeInfo
class implementations to maintain compatibility with Hive. I don't think we should be deviating from their implementation style at risk of introducing unknown issues.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/ExpressionConverter.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Outdated
Show resolved
Hide resolved
long value = random.nextLong(); | ||
writer.writeLong(value); | ||
addedValues.add(value); | ||
value = random.nextLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no random values please; pick an arbitrary one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire test class revolves around generating random values for different primitive types. I don't think we should go against the grain here. Changing this entire class to not use random values is out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not asking for the entire class to change. I'm asking that this PR not make the problem worse. Random values in unit tests are a major source of flakiness. Please don't do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not going to change this because it requires re-writing a lots of boilerplate of code to perform the same assertions. I filed an issue for someone to fix this class in the future: #23840
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of random values why not just i, -79Li, 237Li?
70b6f07
to
29fa113
Compare
@Override | ||
public int hashCode() | ||
{ | ||
return UUID.hashCode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this right the hashCode is a constant, which only makes sense if the object is a singleton. Am I missing something? (always possible)
long value = random.nextLong(); | ||
writer.writeLong(value); | ||
addedValues.add(value); | ||
value = random.nextLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not asking for the entire class to change. I'm asking that this PR not make the problem worse. Random values in unit tests are a major source of flakiness. Please don't do this.
29fa113
to
fb53070
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for support UUID
type in Iceberg connector. Should we add the mapping of UUID
between PrestoDB type and Iceberg type to the chapter Type mapping
in iceberg document?
fb53070
to
7be1ecb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, local doc build, looks good. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change overall looks good to me. Some nits and little problems.
presto-parquet/src/main/java/com/facebook/presto/parquet/cache/MetadataReader.java
Outdated
Show resolved
Hide resolved
...o-parquet/src/main/java/com/facebook/presto/parquet/writer/valuewriter/UuidValuesWriter.java
Show resolved
Hide resolved
...o-parquet/src/main/java/com/facebook/presto/parquet/writer/valuewriter/UuidValuesWriter.java
Outdated
Show resolved
Hide resolved
presto-parquet/src/main/java/com/facebook/presto/parquet/ColumnReaderFactory.java
Outdated
Show resolved
Hide resolved
presto-parquet/src/main/java/com/facebook/presto/parquet/ColumnReaderFactory.java
Outdated
Show resolved
Hide resolved
...ebook/presto/parquet/batchreader/decoders/plain/FixedLenByteArrayUuidPlainValuesDecoder.java
Outdated
Show resolved
Hide resolved
...ava/com/facebook/presto/parquet/batchreader/decoders/rle/UuidRLEDictionaryValuesDecoder.java
Outdated
Show resolved
Hide resolved
...ava/com/facebook/presto/parquet/batchreader/decoders/rle/UuidRLEDictionaryValuesDecoder.java
Outdated
Show resolved
Hide resolved
7be1ecb
to
e1f0034
Compare
...o-parquet/src/main/java/com/facebook/presto/parquet/writer/valuewriter/UuidValuesWriter.java
Show resolved
Hide resolved
...o-parquet/src/main/java/com/facebook/presto/parquet/writer/valuewriter/UuidValuesWriter.java
Outdated
Show resolved
Hide resolved
e002737
to
74ac591
Compare
@@ -255,3 +255,4 @@ private void seek() | |||
} | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding these blank lines should be a separate PR, if it's needed at all. (If checkstyle doesn't complain it probably isn't)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are caught by checkstyle. They are added automatically by the templating tool which generates the files, but weren't caught before because they existed in generated-sources
. I tried a large combination of whitespace-control directives from freemarker but was unsuccessful in removing them. I can split these changes into a separate commit on this PR. The commit will be preserved when this PR merges.
Let me know if you think that is acceptable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go even further and make this a separate PR that goes in first since there are no other changes in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of making a separate PR? Commits in a PR are preserved when merging to master
74ac591
to
5c9def7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't finished reviewing, will continue later tonight.
presto-parquet/src/main/java/com/facebook/presto/parquet/reader/BinaryColumnReader.java
Outdated
Show resolved
Hide resolved
891af15
5c9def7
to
891af15
Compare
...ebook/presto/parquet/batchreader/decoders/plain/FixedLenByteArrayUuidPlainValuesDecoder.java
Outdated
Show resolved
Hide resolved
...ebook/presto/parquet/batchreader/decoders/plain/FixedLenByteArrayUuidPlainValuesDecoder.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
presto-common/src/main/java/com/facebook/presto/common/type/Uuids.java
Outdated
Show resolved
Hide resolved
...ava/com/facebook/presto/parquet/batchreader/decoders/rle/UuidRLEDictionaryValuesDecoder.java
Outdated
Show resolved
Hide resolved
presto-parquet/src/test/java/com/facebook/presto/parquet/reader/BenchmarkUuidColumnReader.java
Outdated
Show resolved
Hide resolved
Previously, the code generation step was removed in 7c814ae However, this makes it more difficult to add new readers in the future, such as for UUIDs. This change adds the code generation step as an optional profile which can be enabled when building the parquet module through -PgenerateParquetReaders
891af15
to
069e2b5
Compare
Thanks for your detailed review @yingsu00 . Here are the new performance numbers from the benchmark original
previous update (+50-100% in non-batch reader from original)
newest update (+62% improvement in plain batch reader)
|
069e2b5
to
c8ec2e4
Compare
The iceberg spec lists uuid as a valid schema type. Presto supports UUID types but there was no support for reading or writing them in the connector. This commit makes the necessary changes in the connector to create tables with UUID columns and support for UUIDs in the parquet reader. This includes an implementation for UUIDs in the batchreader.
c8ec2e4
to
365b88e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Description
This PR adds support for reading and writing UUIDs on Iceberg tables with all available catalogs. In order to support this we also needed improvements to the parquet reader and writer for Parquet's UUID logical type.
Motivation and Context
Presto has type support for UUIDs. We should support reading and writing them in some of the connectors.
Impact
Test Plan
I also added a benchmark for reading and writing UUID types and compared it to our current LongDecimal benchmark to see the performance difference for another type which uses
FIXED_LENGTH_BYTE_ARRAY
as the underlying physical type.These were the microbenchmarks from my local machine on ARM using a build of Corretto JDK11 and with reader verification disabled
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.