Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEN-970: Refactor redshift system metrics to support freshness test #17981

Merged
merged 10 commits into from
Oct 10, 2024

Conversation

sushi30
Copy link
Contributor

@sushi30 sushi30 commented Sep 24, 2024

Describe your changes:

  • moved redshift system metrics to the redshift source module
  • use Timestamp in data quality
  • added plugin feature to test utils
  • run e2e tests

E2E runs

https://github.com/open-metadata/OpenMetadata/actions/workflows/py-cli-e2e-tests.yml?query=branch%3Aredshift-system-metrics-refactor

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

- moved redshift system metrics to the redshift source module
- use Timestamp in data quality
- added plugin feature to test utils
Copy link

Copy link
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link

sonarcloud bot commented Oct 4, 2024

Copy link
Contributor

@TeddyCr TeddyCr Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sushi30 It makes sense to move the computation logic in the connector source itself, though, let's move all of them and keep an entry point to compute the system metric in ingestion/src/metadata/profiler/metrics/system/system.py. so that we have the profiler calling a system.py and inside system.py get the correct System class (e.g. redshift, Snowflake, etc.) and have it call something like SystemMetricsInterface redshift_system_metrics = RedshiftSystemMetric(....); redshift_system_metrics.compute

This way we keep profiler metric computation logic close to the profiler and the implementation close to the source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am moving them one by one as there are side effects and hooks involved. this change is pretty big as it is.

Comment on lines +109 to +111
import_side_effects(
"metadata.ingestion.source.database.redshift.profiler.system",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we solve with this?

Copy link
Contributor Author

@sushi30 sushi30 Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sqlalchemy hooks for system do not get picked up if this class is not imported explicitly. the system profile for redshift will not run with current setup. we currently rely on this side effect when importing get_system_metrics_for_dialect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should take a different approach as suggested here #17981 (comment) to not have to rely on the dispatch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a larger change in how the system profiler works. I can add it as the next iteration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do it in stage no? Keep get_system_metrics_for_dialect (which we should probably keep anyway as the entry point), move all the @get_system_metrics_for_dialect.register(Dialects.BigQuery) to their own classes (you can keep it in the same file for now) and simply call a factory method in get_system_metrics_for_dialect to instantiate the right system metric builder class (that would inherit from the interface) and call interface.process.

That seems limited in scope and avoid introducing the side effect import to support the dispatch in the redshift package, wdyt?

Copy link
Contributor Author

@sushi30 sushi30 Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you suggest to pick up the @get_system_metrics_for_dialect.register without importing the file as side effect? The file containing the function needs to imported in order for the dispatcher to be registered. There is nothing explicitly importing it bc the profiler relies only on the "parent" get_system_metrics_for_dialect.

@sushi30 sushi30 enabled auto-merge (squash) October 7, 2024 16:05
@sushi30 sushi30 requested a review from TeddyCr October 7, 2024 16:05
@sushi30 sushi30 merged commit 68e71cb into main Oct 10, 2024
35 of 36 checks passed
@sushi30 sushi30 deleted the redshift-system-metrics-refactor branch October 10, 2024 06:32
harshach pushed a commit that referenced this pull request Oct 19, 2024
…17981)

* ref(profiler): redshift system metrics

- moved redshift system metrics to the redshift source module
- use Timestamp in data quality
- added plugin feature to test utils

* use timezone.utc

* format

* reverted unintended snowflake changes

* fixed import test_system_metrics.py

* revert

* fixed import in tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ingestion safe to test Add this label to run secure Github workflows on PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants