Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support failOnDataLoss property for Delta source schema tracking #3254

Open
1 of 8 tasks
jackierwzhang opened this issue Jun 11, 2024 · 0 comments
Open
1 of 8 tasks
Labels
enhancement New feature or request good medium issue Good for those with Delta Lake experience

Comments

@jackierwzhang
Copy link
Contributor

Feature request

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Overview

We currently support schemaTrackingLocation (doc) that allows Delta streaming source to track additive and non-additive schema changes during streaming from a Delta table.

However, if failOnDataLoss reader option is used and there's a gap in the data log (e.g. due to log out of retention period), schemaTrackingLocation usage will be blocked.

There maybe better mechanisms to tackle this scenario, such as introducing an option to reinitialize the schema tracking log with the next available schema at that time.

Motivation

This allows the schemaTrackingLocation option be used with failOnDataLoss.

Further details

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.
@jackierwzhang jackierwzhang added the enhancement New feature or request label Jun 11, 2024
@jackierwzhang jackierwzhang changed the title [Feature Request] Support failOnDataLoss property for Delta streaming schema evolution [Feature Request] Support failOnDataLoss property for Delta source schema tracking Jun 11, 2024
@scottsand-db scottsand-db added the good medium issue Good for those with Delta Lake experience label Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good medium issue Good for those with Delta Lake experience
Projects
None yet
Development

No branches or pull requests

2 participants