Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Delta Table optimization in a single entrypoint #395

Open
edmondop opened this issue Apr 19, 2024 · 1 comment
Open

Document Delta Table optimization in a single entrypoint #395

edmondop opened this issue Apr 19, 2024 · 1 comment

Comments

@edmondop
Copy link

With the improvements of Delta Table and the previous existing optimizations, it becomes a little bit harder to wrap our head around it.

  • Data skipping via statistics
  • Data skipping improved via Z Index
  • Bloom Filters
  • Liquid Clustering
  • Merge on Read

Other random ideas add here ... @MrPowers

@MrPowers
Copy link
Collaborator

Thanks for raising this @edmondop.

Here are a few other performance enhancements:

  • relying on metadata only for certain queries
  • Deletion vectors (you kind of already mentioned this one with merge on read)
  • Avoiding expensive file listing operations
  • eliminating small files via compaction
  • calling out that there is file skipping & then predicate pushdown filtering

We could possibly add all these to the Delta Lake Performance blog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants