Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: OpenLineage integration #15317

Merged
merged 20 commits into from
Mar 12, 2024
Merged

Conversation

mgorsk1
Copy link
Contributor

@mgorsk1 mgorsk1 commented Feb 22, 2024

Describe your changes:

Adds connector for collecting lineage using OpenLineage standard. Co-authored with @dechoma

I worked on adding OpenLineage connector because it's a widely used standard for harvesting lineage information and OpenMetadata was missing such integration.

This connector is targeted to users:

  • who use (or plan to use) OpenLineage to emit lineage information to kafka topic (this is one of two ways OL recommends emmiting lineage information)
  • use OpenMetadata to store metadata within their projects and needs to enrich it with lineage information collected using OL

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@mgorsk1
Copy link
Contributor Author

mgorsk1 commented Mar 11, 2024

all checks are green, can you re-review @TeddyCr @ulixius9 ? I wouldn't want this to hang much longer considering how many rebases this already needed

Copy link

sonarcloud bot commented Mar 11, 2024

Copy link

sonarcloud bot commented Mar 11, 2024

Copy link
Collaborator

@pmbrull pmbrull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mgorsk1! I think there were a couple of comments pending from the last review. Left two quick ones and we'll be good to go.

Comment on lines +224 to +206
namespace = run_facet["facets"]["parent"]["job"]["namespace"]
name = run_facet["facets"]["parent"]["job"]["name"]
Copy link
Collaborator

@pmbrull pmbrull Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this, will make maintaining and evolving things easier

From previous comment:

In the OpenLineageEvent we have defined run_facet of generic Dict type do you think we can be more 
specific and define the exact model also to avoid keyerror in case when we did not receive the response in 
expected format.

pipeline_name = self.get_pipeline_name(pipeline_details)
try:
description = f"""```json
{json.dumps(pipeline_details.run_facet, indent=4).strip()}```"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review!

@@ -192,17 +192,34 @@ def _(

@fqn_build_registry.add(DatabaseSchema)
def _(
_: Optional[OpenMetadata], # ES Search not enabled for Schemas
metadata: Optional[OpenMetadata], # ES Search not enabled for Schemas
Copy link
Member

@ulixius9 ulixius9 Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review if this impacts other parts of code and if this change can be avoided!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's in line with fqn for Table and in general tests did not fail after this change

@@ -574,6 +591,43 @@ def build_es_fqn_search_string(
return fqn_search_string


def search_database_schema_from_es(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for this

@pmbrull pmbrull merged commit 98850ab into open-metadata:main Mar 12, 2024
32 checks passed
@pmbrull
Copy link
Collaborator

pmbrull commented Mar 12, 2024

let's go over the pending topics on a separate PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend documentation Improvements or additions to documentation e2e:Integration Ingestion safe to test Add this label to run secure Github workflows on PRs UI UI specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants