Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Update ConflictChecker to perform conflict resolution of ICT #3283

Merged
merged 7 commits into from
Jul 16, 2024

Conversation

EstherBear
Copy link
Contributor

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Update ConflictChecker to perform conflict resolution of inCommitTimestamp and complete the inCommitTimestamp support in Kernel.

How was this patch tested?

Add unit tests to verify the conflict resolution of timestamps and enablement version.

Does this PR introduce any user-facing changes?

Yes, user can enable monotonic inCommitTimestamp by enabling its property.

@EstherBear
Copy link
Contributor Author

This PR is based on #3282 merge it first.

Comment on lines 112 to 116
if (i == actionBatchList.size() - 1) {
CommitInfo commitInfo =
getCommitInfo(batch.getColumnVector(COMMITINFO_ORDINAL));
winningCommitInfoOpt = Optional.ofNullable(commitInfo);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to get the last commit info, why not always look for the commit info, store in a variable, and the last value of the variable is the one we need?

Also, we can't assume the commit info to be present in the last action batch. A commit file could be read could generate multiple batches and the first batch in the list contains the commit info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We couldn't always store commit info in a variable and use the last value of the variable because this scenario mentioned by @dhruvarya-db. But you are right, it's not necessarily in the last action batch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to check if a batch is generated from the last commit file by checking its version from ActionWrapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also add a test for this scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the solution by checking version from ActionWrapper.

long lastWinningVersion = getLastWinningTxnVersion(winningCommits);
return new TransactionRebaseState(
lastWinningVersion,
getLastCommitTimestamp(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to fetch the ICT again? aren't we reading already above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just read the CommitInfo above but not the ICT? We still need to extract the ICT in CommitInfo?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vkorukanti From what I understand, this will not perform an additional IO. This simply extracts the timestamp from the CommitInfo action (or the file modification timestamp if ICT is not enabled.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for clarifying.

return new TransactionRebaseState(
lastWinningVersion,
getLastCommitTimestamp(
engine, lastWinningVersion, winningCommits, winningCommitInfoOpt.get()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to check that winningCommitInfoOpt is not empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This get is for the atomic reference but not for the optional.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, how do verify it is actually set? do you want to start with null and then verify it is not null?

Copy link
Contributor Author

@EstherBear EstherBear Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It starts with optional.empty. And in the getLastCommitTimestamp function it will check if it's ict enabled. And if it's ict enabled and the winningCommitInfoOpt is empty, the CommitInfo.getRequiredInCommitTimestamp will raise an error. It has the same logic with Delta Spark.

@@ -88,10 +92,13 @@ public static TransactionRebaseState resolveConflicts(

public TransactionRebaseState resolveConflicts(Engine engine) throws ConcurrentWriteException {
List<FileStatus> winningCommits = getWinningCommitFiles(engine);
AtomicReference<Optional<CommitInfo>> winningCommitInfoOpt =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a atmoic reference? Why can't we just use the Optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of this error:
#3283 (comment)

winningCommitTimestamp = CommitInfo.getRequiredInCommitTimestamp(
winningCommitInfoOpt,
String.valueOf(lastWinningVersion),
snapshot.getDataPath());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need the datapath here?

Copy link
Contributor Author

@EstherBear EstherBear Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the error message print which is consistent with Delta-Spark.

} else {
winningCommitTimestamp = CommitInfo.getRequiredInCommitTimestamp(
winningCommitInfoOpt,
String.valueOf(lastWinningVersion),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why convert to string?

Copy link
Contributor Author

@EstherBear EstherBear Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the error message print which is consistent with Delta-Spark.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than passing these as string, may be just prepare a string which contains the lastWinningVersion and tablePath and pass it as a context to getRequiredInCommitTimestamp?

CommitInfo.getRequiredInCommitTimestamp(winningCommitInfoOpt, String.format("error...", ...))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getRequiredInCommitTimestamp has some logic to check if winningCommitInfoOpt is empty and if it contains ict and raises different errors accordingly. So I think it's better to leave this function handle with the error messages?

Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for working this.

Last few comments. Once addressed, this PR is good to go.

IN_COMMIT_TIMESTAMP_ENABLEMENT_VERSION.fromMetadata(ver2Snapshot.getMetadata)
assert(observedEnablementTimestamp.get == ver1Snapshot.getTimestamp(engine) + 1)
assert(
observedEnablementTimestamp.get == getInCommitTimestamp(engine, table, version = 2).get)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can remove .get from both sides.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are different actually. Left is Optional[Long] and right is Option[Long].

} else {
winningCommitTimestamp = CommitInfo.getRequiredInCommitTimestamp(
winningCommitInfoOpt,
String.valueOf(lastWinningVersion),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than passing these as string, may be just prepare a string which contains the lastWinningVersion and tablePath and pass it as a context to getRequiredInCommitTimestamp?

CommitInfo.getRequiredInCommitTimestamp(winningCommitInfoOpt, String.format("error...", ...))

return new TransactionRebaseState(
lastWinningVersion,
getLastCommitTimestamp(
engine, lastWinningVersion, winningCommits, winningCommitInfoOpt.get()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, how do verify it is actually set? do you want to start with null and then verify it is not null?

long lastWinningVersion = getLastWinningTxnVersion(winningCommits);
return new TransactionRebaseState(
lastWinningVersion,
getLastCommitTimestamp(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for clarifying.

@vkorukanti vkorukanti merged commit 4430dc1 into delta-io:master Jul 16, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants