Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(parquet): Use velox parquet reader in StatisticsTest and Metadatatest #11472

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jkhaliqi
Copy link
Contributor

@jkhaliqi jkhaliqi commented Nov 7, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 7, 2024
Copy link

netlify bot commented Nov 7, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 38f48d1
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6793d71d8861d300080e2284

@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch 3 times, most recently from 4af254d to c97061c Compare November 7, 2024 22:34
Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkhaliqi can you move the file from arrow tests to here so that we can only see the Velox specific changes?

return readerBase_->thriftFileMetaDatas();
}

void WriteBufferToFile(const std::shared_ptr<arrow::Buffer>& buffer, const std::string& file_path) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this here as this only used for testing.

Copy link
Contributor Author

@jkhaliqi jkhaliqi Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will change this methods location since it is only for testing before changing from draft PR, for now will use this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted whole function

@@ -74,6 +75,10 @@ class ReaderBase {
return *fileMetaData_;
}

std::shared_ptr<thrift::FileMetaData> thriftFileMetaDatas() const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use thriftFileMetaData() above. We don't need this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use above function, thank you!

@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch 4 times, most recently from d62718d to 495598e Compare November 8, 2024 23:51
}

void WriteBufferToFile(const std::shared_ptr<arrow::Buffer>& buffer, const std::string& file_path) {
auto source = std::make_shared<::arrow::io::BufferReader>(buffer);
Copy link
Collaborator

@majetideepak majetideepak Nov 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply write buffer()(returns a std::string_view) to a file using LocalWriteFile. See S3FileSystemTest.cpp for example.

ASSERT_NE(nullptr, file_reader->metadata()->key_value_metadata());
auto read_kv_meta = file_reader->metadata()->key_value_metadata();
// Write the buffer to a temp file
std::string file_path = "output_file.parquet";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use TempFilePath.

@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch 6 times, most recently from 8aa9724 to 85013aa Compare November 15, 2024 19:51
@jkhaliqi jkhaliqi changed the title refactor arrow reader to parquet reader in MetadataTest.cpp refactor(parquet): Use Velox Parquet Reader in the Parquet Writer tests Nov 15, 2024
@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch 10 times, most recently from 287d618 to d5f7423 Compare November 20, 2024 20:07
@jkhaliqi jkhaliqi changed the title refactor(parquet): Use Velox Parquet Reader in the Parquet Writer tests refactor(parquet): Use velox parquet reader in StatisticsTest and Metadatatest Nov 20, 2024
@jkhaliqi jkhaliqi marked this pull request as ready for review November 20, 2024 20:08
@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch from d5f7423 to ece201e Compare November 26, 2024 16:37
@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch 2 times, most recently from abc8f1d to c862477 Compare December 23, 2024 23:32
@@ -106,6 +106,8 @@ class ParquetReader : public dwio::common::Reader {

FileMetaDataPtr fileMetaData() const;

const thrift::FileMetaData& thriftFileMetaData() const;
Copy link
Collaborator

@majetideepak majetideepak Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to expose Thrift in the Parquet API.
We should add the translation API inside Metadata.h/.cc where needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, added the translation to the Metadata.h/cc files instead!

@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch 6 times, most recently from 9722ce2 to 4642cc1 Compare January 22, 2025 06:02
@jkhaliqi jkhaliqi force-pushed the rmv_arrow_parquet_reader branch from 4642cc1 to 38f48d1 Compare January 24, 2025 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants