Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Make TPCH dbgen text buffer size consistent with Presto Java #12169

Conversation

minhancao
Copy link
Contributor

@minhancao minhancao commented Jan 24, 2025

Changed text buffer size to be 300 MB for Velox's dbgen to match with Java Presto TPCH dbgen's text buffer size. The text buffer size is used in randomly generating offset and length to grab a chunk from the overall text for each row. This fixed the difference in the comment column for the tables in TPCH.

Java:
https://github.com/trinodb/tpch/blob/master/src/main/java/io/trino/tpch/TextPool.java#L35

private static final int DEFAULT_TEXT_POOL_SIZE = 300 * 1024 * 1024;

C++:
https://github.com/facebookincubator/velox/blob/main/velox/tpch/gen/DBGenIterator.cpp#L40

load_dists(
        10 * 1024 * 1024, &dbgenCtx); // 10 MB buffer size for text generation.

Resolves: prestodb/presto#24011

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 24, 2025
Copy link

netlify bot commented Jan 24, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit aee5149
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6794274a005b200008017db2

@aditi-pandit
Copy link
Collaborator

@minhancao : Please fix the PR title fix: Make TPCH dbgen text buffer size consistent with Presto Java

@minhancao minhancao changed the title Fix for TPCH dbgen comment column difference Java Presto vs C++ Velox fix: Make TPCH dbgen text buffer size consistent with Presto Java Jan 24, 2025
Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @minhancao

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minhancao : Please can you fix the test

[ RUN      ] TpchConnectorTest.simple
/__w/velox/velox/velox/vector/tests/utils/VectorTestBase.cpp:151: Failure
Value of: expected->equalValueAt(actual.get(), i, i)
  Actual: false
Expected: true
at 0: expected {0, ALGERIA, 0, furiously regular requests. platelets affix furious}, but got {0, ALGERIA, 0,  haggle. carefully final deposits detect slyly agai}
[  FAILED  ] TpchConnectorTest.simple (1422 ms)

@amitkdutta
Copy link
Contributor

amitkdutta commented Jan 24, 2025

Just tested with java and c++, checksum on o_comment matches with this change.

Java:
presto:de> select checksum(o_comment) from tpch.sf1.orders;
          _col0          
-------------------------
 04 60 df 1a 45 10 06 84 
(1 row)


C++:
presto:de> select checksum(o_comment) from tpch.sf1.orders;
          _col0          
-------------------------
 04 60 df 1a 45 10 06 84

@minhancao minhancao force-pushed the fix_comment_column_tpch_dbgen branch from e42165f to a6a95e2 Compare January 24, 2025 22:36
@minhancao
Copy link
Contributor Author

@aditi-pandit Fixed the test by updating the expected vector for n_comment column.

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @minhancao

Copy link
Contributor

@amitkdutta amitkdutta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@kgpai
Copy link
Contributor

kgpai commented Jan 24, 2025

@minhancao Can you rebase against latest ? Not sure why the fuzzer build jobs are failing.

@amitkdutta amitkdutta added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Jan 24, 2025
@minhancao minhancao force-pushed the fix_comment_column_tpch_dbgen branch from a6a95e2 to aee5149 Compare January 24, 2025 23:50
@minhancao
Copy link
Contributor Author

@kgpai Rebased to latest main branch

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng merged this pull request in dea4758.

Copy link

Conbench analyzed the 0 benchmark runs that triggered this notification.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TPCH connector return different results on prestissimo for o_comment
5 participants