Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle external paths #40

Merged
merged 4 commits into from
Oct 19, 2023
Merged

handle external paths #40

merged 4 commits into from
Oct 19, 2023

Conversation

johnnyaug
Copy link
Contributor

@johnnyaug johnnyaug commented Oct 18, 2023

Resolves #39.

Allow lakeFS Iceberg catalog to work with external (non-lakeFS) paths.
This way, an imported table which includes such paths, can still work.

Assumptions

  1. Used lakectl import to import a Iceberg table.
  2. The original Iceberg table is in Hadoop Iceberg Catalog format.
  3. When querying the table, the client has access to the original storage.
  4. (If original table is in S3) Per-bucket configuration is used to point S3A to the lakeFS S3 gateway, but only for the specific repository name. For other bucket names, it uses AWS S3.

Copy link

@nopcoder nopcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the idea ;)

Think we can just check path.startsWith("s3a://") once to branch between lakefs or non lakefs flow.

@johnnyaug
Copy link
Contributor Author

@nopcoder this function may be called with all of the following:

  1. "Relative" lakeFS paths.
  2. s3a URIs under the repository.
  3. External s3a URIs.

So we need to handle all of these cases. Added a test to reflect that.
Also thanks to @lynnro314, changed the initial check to allow non-s3a external URIs.

Copy link
Contributor

@lynnro314 lynnro314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@johnnyaug johnnyaug merged commit 6418b3c into main Oct 19, 2023
1 check passed
@johnnyaug johnnyaug deleted the handle_external_paths branch October 19, 2023 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support imported Iceberg tables
3 participants