-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support refreshing Iceberg tables #5707
Open
lbooker42
wants to merge
40
commits into
deephaven:main
Choose a base branch
from
lbooker42:lab-iceberg-refreshing
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
470b09c
Initial commit of refreshing Iceberg.
lbooker42 a8d957a
Rebased to main.
lbooker42 264fdb1
Change IcebergInstructions refreshing indicator to enum instead of bo…
lbooker42 58d0a73
WIP, for review
lbooker42 e090474
Manual and auto-refreshing working, better documentation.
lbooker42 57021ad
Addressed more PR comments, some remaining.
lbooker42 fb882e8
WIP, some PR comments addressed.
lbooker42 5bbdeb2
WIP, even more PR comments addressed.
lbooker42 3da205c
Nearly all PR comments addressed.
lbooker42 91acf9b
merged with main
lbooker42 08dd329
Adjustment to IcebergInstructions update mode.
lbooker42 7af0d1d
Added python wrapper for Iceberg refreshing tables.
lbooker42 2d79c38
Changes to mocked tests for ColumnSourceManager and PartitionAwareSou…
lbooker42 3809f21
Added DHError handler and add'l documentation to python `snapshots()`…
lbooker42 5273a15
Fixed typo in JavaDoc
lbooker42 b9e2c6e
WIP
lbooker42 9937f79
Suggestion from review
lbooker42 cd08038
WIP, changes to revert some transaction token code.
lbooker42 f28325f
Correct logic across multiple transactions.
lbooker42 2d92b3f
Merged with main
lbooker42 cd31d82
Moved transaction accumulation to AbstractTableLocationProvider
lbooker42 d680c0c
Moved transaction accumulation to AbstractTableLocationProvider
lbooker42 6607fc3
PR comments addressed.
lbooker42 893336f
Updated to use IcebergTableAdapter and exposed in python. Addressed P…
lbooker42 68e4546
Incorporated external PR to update PartitioningColumnDataIndex for re…
lbooker42 273f5c1
Added additional snapshots with removes to IcebergToolsTest resources.
lbooker42 1e92a19
Merge branch 'main' into lab-iceberg-refreshing
lbooker42 f72c1b7
Manual and auto refreshing tests for Iceberg.
lbooker42 5c7ff12
Manual and auto refreshing tests for Iceberg, not passing.
lbooker42 92eec61
PR comments addressed.
lbooker42 95194b7
Implemented improved location reference counting in AbstractTableLoca…
lbooker42 09e2b6e
Fixing doc problem.
lbooker42 30910dd
For review only, does not compile :(
lbooker42 dd12240
Compiles now, still many problems
lbooker42 944dac6
Working through problems.
lbooker42 912b9f2
Cleanup and minor changes
lbooker42 e2b6fd0
Refactored ATLP
lbooker42 72b03be
Merged with main.
lbooker42 01e50fe
Updated but RCSM still not referenced properly.
lbooker42 81e88d8
Refreshing tests still need work.
lbooker42 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,24 +40,6 @@ interface Listener extends BasicTableDataListener { | |
*/ | ||
void endTransaction(@NotNull Object token); | ||
|
||
/** | ||
* <p> | ||
* Notify the listener of a {@link TableLocationKey} encountered while initiating or maintaining the location | ||
* subscription. This should occur at most once per location, but the order of delivery is <i>not</i> | ||
* guaranteed. | ||
* </p> | ||
* <p> | ||
* If transactionToken is {@code null}, the key will be added to the pending additions immediately. | ||
* </p> | ||
* | ||
* @param tableLocationKey The new table location key. | ||
* @param transactionToken The token identifying the transaction, or {@code null} if this addition is not part | ||
* of a transaction. | ||
*/ | ||
void handleTableLocationKeyAdded( | ||
@NotNull ImmutableTableLocationKey tableLocationKey, | ||
@Nullable Object transactionToken); | ||
|
||
/** | ||
* Notify the listener of a {@link TableLocationKey} encountered while initiating or maintaining the location | ||
* subscription. This should occur at most once per location, but the order of delivery is <i>not</i> | ||
|
@@ -66,50 +48,33 @@ void handleTableLocationKeyAdded( | |
* | ||
* @param tableLocationKey The new table location key. | ||
*/ | ||
default void handleTableLocationKeyAdded(@NotNull ImmutableTableLocationKey tableLocationKey) { | ||
handleTableLocationKeyAdded(tableLocationKey, null); | ||
} | ||
void handleTableLocationKeyAdded(@NotNull ImmutableTableLocationKey tableLocationKey); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good change, may be breaking for DHE, please consult Andy pre-merge. |
||
|
||
/** | ||
* <p> | ||
* Notify the listener of a {@link TableLocationKey} that has been removed. | ||
* </p> | ||
* <p> | ||
* If transactionToken is {@code null}, the key will be added to the pending removals immediately. | ||
* </p> | ||
* | ||
* @param tableLocationKey The table location key that was removed. | ||
* @param transactionToken The token identifying the transaction, or {@code null} if this addition is not part | ||
* of a transaction. | ||
*/ | ||
void handleTableLocationKeyRemoved( | ||
@NotNull ImmutableTableLocationKey tableLocationKey, | ||
@Nullable Object transactionToken); | ||
|
||
/** | ||
* Notify the listener of a {@link TableLocationKey} that has been removed. This addition is not part of any | ||
* transaction, and is equivalent to {@code handleTableLocationKeyAdded(tableLocationKey, null);} by default. | ||
* Notify the listener of a {@link TableLocationKey} that has been removed. This removal is not part of any | ||
* transaction, and is equivalent to {@code handleTableLocationKeyRemoved(tableLocationKey, null);} by default. | ||
* | ||
* @param tableLocationKey The table location key that was removed. | ||
*/ | ||
@SuppressWarnings("unused") | ||
default void handleTableLocationKeyRemoved(@NotNull ImmutableTableLocationKey tableLocationKey) { | ||
handleTableLocationKeyRemoved(tableLocationKey, null); | ||
} | ||
void handleTableLocationKeyRemoved(@NotNull ImmutableTableLocationKey tableLocationKey); | ||
|
||
/** | ||
* <p> | ||
* Notify the listener of collections of {@link TableLocationKey TableLocationKeys} added or removed while | ||
* initiating or maintaining the location subscription. This should occur at most once per location, but the | ||
* order of delivery is <i>not</i> guaranteed. | ||
* initiating or maintaining the location subscription. Addition or removal should occur at most once per | ||
* location, but the order of delivery is <i>not</i> guaranteed. | ||
* </p> | ||
* | ||
* @param addedKeys Collection of table location keys that were added. | ||
* @param removedKeys Collection of table location keys that were removed. | ||
*/ | ||
void handleTableLocationKeysUpdate( | ||
@Nullable Collection<ImmutableTableLocationKey> addedKeys, | ||
@Nullable Collection<ImmutableTableLocationKey> removedKeys); | ||
default void handleTableLocationKeysUpdate( | ||
@NotNull Collection<ImmutableTableLocationKey> addedKeys, | ||
@NotNull Collection<ImmutableTableLocationKey> removedKeys) { | ||
removedKeys.forEach(this::handleTableLocationKeyRemoved); | ||
addedKeys.forEach(this::handleTableLocationKeyAdded); | ||
} | ||
} | ||
|
||
/** | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider whether we can have add + remove + add. What about remove + add in the same pull?
Should document that this may change the "at most once per location" guarantee, and define semantics.
I think it should be something like:
We allow re-add of a removed TLK. Downstream consumers should process these in an order that respects delivery and transactionality.
Within one transaction, expect at most one of "remove" or "add" for a given TLK.Within one transaction, we can allow remove followed by add, but not add followed by remove. This dictates that we deliver pending removes before pending adds in
processPending
.That is, one transaction allows:
Double add, double remove, or add followed by remove is right out.
Processing an addition to a transaction.
Across multiple transactions delivered as a batch, ensure that the right end-state is achieved.
withinwithout their opposite intervening is an error.null
token should be handled exactly the same as a single-element transaction.Processing a transaction:
Note: removal support means that RegionedColumnSources may no longer be immutable! We need to be sure that we are aware of whether a particular TLP might remove data, and ensure that in those cases the RCS is not marked immutable. REVISED: ONLY REPLACE IS AN ISSUE FOR IMMUTABILITY, AS LONG AS WE DON'T RESUSE SLOTS.
We discussed that TLPs should probably specify whether they are guaranteeing that they will never remove TLKs, and whether their TLs will never remove or modify rows. I think if and when we encounter data sources that require modify support, we should probably just use
SourcePartitionedTable
instead ofPartitionAwareSourceTable
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I need to handle the RCS immutability question in this PR since Iceberg will not modify rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing a region makes the values in the corresponding row key range disappear. That's OK for immutability.
If you allow a new region to use the same slot, or allow the old region to reincarnate in the same slot potentially with different data, you are violating immutability.
Not reusing slots means that a long-lived iceberg table may eventually exhaust its row key space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace (remove + add of a TLK) requires some kind of versioning of the TL, in a way that the TLK is aware of in order to ensure that we provide the table with the right TL for the version.
AbstractTableLocationProvider
's location caching layer is not currently sufficient for atomically replacing TLs.