-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEN-970: Refactor redshift system metrics to support freshness test #17981
Conversation
- moved redshift system metrics to the redshift source module - use Timestamp in data quality - added plugin feature to test utils
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
Quality Gate passed for 'open-metadata-ingestion'Issues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sushi30 It makes sense to move the computation logic in the connector source itself, though, let's move all of them and keep an entry point to compute the system metric in ingestion/src/metadata/profiler/metrics/system/system.py
. so that we have the profiler calling a system.py and inside system.py get the correct System class (e.g. redshift, Snowflake, etc.) and have it call something like SystemMetricsInterface redshift_system_metrics = RedshiftSystemMetric(....); redshift_system_metrics.compute
This way we keep profiler metric computation logic close to the profiler and the implementation close to the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am moving them one by one as there are side effects and hooks involved. this change is pretty big as it is.
import_side_effects( | ||
"metadata.ingestion.source.database.redshift.profiler.system", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we solve with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqlalchemy hooks for system do not get picked up if this class is not imported explicitly. the system profile for redshift will not run with current setup. we currently rely on this side effect when importing get_system_metrics_for_dialect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should take a different approach as suggested here #17981 (comment) to not have to rely on the dispatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires a larger change in how the system profiler works. I can add it as the next iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do it in stage no? Keep get_system_metrics_for_dialect
(which we should probably keep anyway as the entry point), move all the @get_system_metrics_for_dialect.register(Dialects.BigQuery)
to their own classes (you can keep it in the same file for now) and simply call a factory method in get_system_metrics_for_dialect
to instantiate the right system metric builder class (that would inherit from the interface) and call interface.process
.
That seems limited in scope and avoid introducing the side effect import to support the dispatch in the redshift package, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you suggest to pick up the @get_system_metrics_for_dialect.register
without importing the file as side effect? The file containing the function needs to imported in order for the dispatcher to be registered. There is nothing explicitly importing it bc the profiler relies only on the "parent" get_system_metrics_for_dialect
.
…17981) * ref(profiler): redshift system metrics - moved redshift system metrics to the redshift source module - use Timestamp in data quality - added plugin feature to test utils * use timezone.utc * format * reverted unintended snowflake changes * fixed import test_system_metrics.py * revert * fixed import in tests
Describe your changes:
E2E runs
https://github.com/open-metadata/OpenMetadata/actions/workflows/py-cli-e2e-tests.yml?query=branch%3Aredshift-system-metrics-refactor
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>