Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #12129 : enhance bigquery sample data query #12130

Merged

Conversation

duongphannamhung
Copy link
Contributor

Describe your changes:

read more with my enhancement issue #12129

Another concern: in my opinion, profiler fetching data with row count type will also be executed with default percent to get the efficiency query and then limit with row count. It would be nice for a big table, but not really good for the really small one (< 500 rows). Hope to hear your advice for better solution.

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

@github-actions
Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@cypress
Copy link

cypress bot commented Jun 24, 2023

Passing run #25318 ↗︎

0 249 59 0 Flakiness 0

Details:

Update profiler_source.py
Project: openmetadata Commit: 88efc1e671
Status: Passed Duration: 28:34 💡
Started: Jul 3, 2023 2:32 PM Ended: Jul 3, 2023 3:01 PM

This comment has been generated by cypress-bot as a result of this project's GitHub integration settings.

@harshach
Copy link
Collaborator

Thanks @duongphannamhung for the contribution. We will review the PR cc @TeddyCr

profile_sample_query=self.profile_query,
)

if self.service_connection_config.type.value == "BigQuery":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having if else here, it would be better to have overriding capabilities with-in each connector

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the SamplerProtocol for future inheritance and mapping, hope it's working out

@duongphannamhung
Copy link
Contributor Author

I've just modified name of method get_sample_query -> get_bq_sample_query. Cause of method random_sample which gets Table Profile still have to use get_sample_query to convert to sqlalchemy structure.

In sqlalchemy 2.0 it has tablesample but 1.4 current don't. Hope a bump for little more straightforward current random sample query.
https://docs.sqlalchemy.org/en/20/core/selectable.html#sqlalchemy.sql.expression.tablesample

@TeddyCr TeddyCr temporarily deployed to test July 3, 2023 13:03 — with GitHub Actions Inactive
@TeddyCr TeddyCr temporarily deployed to test July 3, 2023 13:03 — with GitHub Actions Inactive
@TeddyCr TeddyCr temporarily deployed to test July 3, 2023 13:03 — with GitHub Actions Inactive
@TeddyCr TeddyCr temporarily deployed to test July 3, 2023 13:03 — with GitHub Actions Inactive
@TeddyCr TeddyCr temporarily deployed to test July 3, 2023 13:03 — with GitHub Actions Inactive
@TeddyCr TeddyCr temporarily deployed to test July 3, 2023 13:03 — with GitHub Actions Inactive
@github-actions
Copy link
Contributor

github-actions bot commented Jul 3, 2023

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@duongphannamhung duongphannamhung temporarily deployed to test July 3, 2023 14:09 — with GitHub Actions Inactive
@duongphannamhung duongphannamhung temporarily deployed to test July 3, 2023 14:09 — with GitHub Actions Inactive
@duongphannamhung duongphannamhung temporarily deployed to test July 3, 2023 14:10 — with GitHub Actions Inactive
@duongphannamhung duongphannamhung temporarily deployed to test July 3, 2023 14:10 — with GitHub Actions Inactive
@duongphannamhung duongphannamhung temporarily deployed to test July 3, 2023 14:10 — with GitHub Actions Inactive
@duongphannamhung duongphannamhung temporarily deployed to test July 3, 2023 14:10 — with GitHub Actions Inactive
@sonarcloud
Copy link

sonarcloud bot commented Jul 3, 2023

[open-metadata-ingestion] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

58.9% 58.9% Coverage
0.0% 0.0% Duplication

@TeddyCr TeddyCr merged commit 64f147c into open-metadata:main Jul 4, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ingestion safe to test Add this label to run secure Github workflows on PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants