Skip to content

Commit

Permalink
[Spark] Use DeltaLog.getChanges API in DeltaFileProviderUtils's getDe…
Browse files Browse the repository at this point in the history
…ltaFilesInVersionRange API (delta-io#2986)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

Use DeltaLog.getChanges API in DeltaFileProviderUtils's
getDeltaFilesInVersionRange API.

## How was this patch tested?

UT

## Does this PR introduce _any_ user-facing changes?

No
  • Loading branch information
prakharjain09 authored Apr 30, 2024
1 parent 99efb21 commit e84a874
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,14 @@ object DeltaFileProviderUtils {
deltaLog: DeltaLog,
startVersion: Long,
endVersion: Long): Seq[FileStatus] = {
val result = deltaLog
.listFrom(startVersion)
.collect { case DeltaFile(fs, v) if v <= endVersion => (fs, v) }
.toSeq
// Pass `failOnDataLoss = false` as we are doing an explicit validation on the result ourselves
// to identify that there are no gaps.
val result =
deltaLog
.getChangeLogFiles(startVersion, endVersion, failOnDataLoss = false)
.map(_._2)
.collect { case DeltaFile(fs, v) => (fs, v) }
.toSeq
// Verify that we got the entire range requested
if (result.size.toLong != endVersion - startVersion + 1) {
throw DeltaErrors.deltaVersionsNotContiguousException(spark, result.map(_._2))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -713,4 +713,28 @@ class DeltaLogSuite extends QueryTest
}
}
}

// This test needs to be extended to Managed Commits once DeltaLogSuite gets extended.
test("DeltaFileProviderUtils.getDeltaFilesInVersionRange") {
withTempDir { dir =>
val path = dir.getCanonicalPath
spark.range(0, 1).write.format("delta").mode("overwrite").save(path)
spark.range(0, 1).write.format("delta").mode("overwrite").save(path)
spark.range(0, 1).write.format("delta").mode("overwrite").save(path)
spark.range(0, 1).write.format("delta").mode("overwrite").save(path)
val log = DeltaLog.forTable(spark, new Path(path))
val result = DeltaFileProviderUtils.getDeltaFilesInVersionRange(
spark, log, startVersion = 1, endVersion = 3)
assert(result.map(FileNames.getFileVersion) === Seq(1, 2, 3))
val filesAreUnbackfilledArray = result.map(FileNames.isUnbackfilledDeltaFile)

val (fileV1, fileV2, fileV3) = (result(0), result(1), result(2))
assert(FileNames.getFileVersion(fileV1) === 1)
assert(FileNames.getFileVersion(fileV2) === 2)
assert(FileNames.getFileVersion(fileV3) === 3)

assert(filesAreUnbackfilledArray === Seq(false, false, false))
}
}

}

0 comments on commit e84a874

Please sign in to comment.