Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug when writing Parquet files #30

Merged
merged 6 commits into from
Jun 29, 2023
Merged

Conversation

johnnyaug
Copy link
Contributor

@johnnyaug johnnyaug commented Jun 29, 2023

Resolves #23 (hopefully).

When handling Parquet, Iceberg gives special treatment when reading/writing to Hadoop. To do so, it checks if the FileInput/FileOutput is an instance of HadoopFileInput/HadoopFileOutput. For this to keep working with lakeFS, we must now use these classes instead of our dedicated ones.

Copy link

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks like a simple solution although I don't understand any of it 🤔
How was this tested?


import org.apache.hadoop.fs.Path;

public class LakeFSPath extends Path {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this class needed? Please add documentation

@johnnyaug johnnyaug requested a review from lynnro314 June 29, 2023 09:55
@johnnyaug johnnyaug temporarily deployed to Treeverse signing June 29, 2023 10:21 — with GitHub Actions Inactive
@johnnyaug johnnyaug temporarily deployed to Treeverse signing June 29, 2023 11:54 — with GitHub Actions Inactive
@johnnyaug johnnyaug merged commit 582f039 into main Jun 29, 2023
@johnnyaug johnnyaug deleted the bugfix/use_hadoop_file_inputs branch June 29, 2023 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants