Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: support refreshing Iceberg tables #5707
feat!: support refreshing Iceberg tables #5707
Changes from 28 commits
470b09c
a8d957a
264fdb1
58d0a73
e090474
57021ad
fb882e8
5bbdeb2
3da205c
91acf9b
08dd329
7af0d1d
2d79c38
3809f21
5273a15
b9e2c6e
9937f79
cd08038
f28325f
2d92b3f
cd31d82
d680c0c
6607fc3
893336f
68e4546
273f5c1
1e92a19
f72c1b7
5c7ff12
92eec61
95194b7
09e2b6e
30910dd
dd12240
944dac6
912b9f2
e2b6fd0
72b03be
01e50fe
81e88d8
b37a04e
dead9c4
e5d10e7
b30e240
fa9d154
d807a94
db2b031
746b343
d69ddcd
06a3bd5
91ba92c
73e8824
be689ef
b15c78a
de9f6ae
d5c13e6
851260c
d1557c5
975a013
ad5a225
e5c8f49
1f44a44
300f836
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider whether we can have add + remove + add. What about remove + add in the same pull?
Should document that this may change the "at most once per location" guarantee, and define semantics.
I think it should be something like:
We allow re-add of a removed TLK. Downstream consumers should process these in an order that respects delivery and transactionality.
Within one transaction, expect at most one of "remove" or "add" for a given TLK.Within one transaction, we can allow remove followed by add, but not add followed by remove. This dictates that we deliver pending removes before pending adds in
processPending
.That is, one transaction allows:
Double add, double remove, or add followed by remove is right out.
Processing an addition to a transaction.
Across multiple transactions delivered as a batch, ensure that the right end-state is achieved.
withinwithout their opposite intervening is an error.null
token should be handled exactly the same as a single-element transaction.Processing a transaction:
Note: removal support means that RegionedColumnSources may no longer be immutable! We need to be sure that we are aware of whether a particular TLP might remove data, and ensure that in those cases the RCS is not marked immutable. REVISED: ONLY REPLACE IS AN ISSUE FOR IMMUTABILITY, AS LONG AS WE DON'T RESUSE SLOTS.
We discussed that TLPs should probably specify whether they are guaranteeing that they will never remove TLKs, and whether their TLs will never remove or modify rows. I think if and when we encounter data sources that require modify support, we should probably just use
SourcePartitionedTable
instead ofPartitionAwareSourceTable
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I need to handle the RCS immutability question in this PR since Iceberg will not modify rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing a region makes the values in the corresponding row key range disappear. That's OK for immutability.
If you allow a new region to use the same slot, or allow the old region to reincarnate in the same slot potentially with different data, you are violating immutability.
Not reusing slots means that a long-lived iceberg table may eventually exhaust its row key space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace (remove + add of a TLK) requires some kind of versioning of the TL, in a way that the TLK is aware of in order to ensure that we provide the table with the right TL for the version.
AbstractTableLocationProvider
's location caching layer is not currently sufficient for atomically replacing TLs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider preserving this method,
@Deprecated
. Mostly a question of how inconvenient just deleting it would be for @abaranec .