Skip to content

Commit

Permalink
Docs: Service Spec updation (open-metadata#18796)
Browse files Browse the repository at this point in the history
  • Loading branch information
RounakDhillon authored Nov 27, 2024
1 parent 3ff11d8 commit 82f8554
Showing 1 changed file with 70 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,76 @@ The main takes for developing a new connector are:
- To understand that each of our Source Types (Databases, Dashboards, etc) have a Topology attached.
- To understand that the process flow is implemented as a generator chain, going through each step.

## Service Spec

When developing a new database ingestion connector in OpenMetadata, ensure all necessary components are correctly configured. This guide outlines the steps required to define the connector's ingestion capabilities using a `service_spec.py` file.

---

### Why Use `service_spec.py`?

The `service_spec.py` file centralizes the definitions of sources, profilers, lineage, and other ingestion-related components for a connector. This approach helps standardize implementations across connectors, making it easier to manage ingestion workflows.

---

### Steps to Develop a New Connector

#### 1. Create the `service_spec.py` File
Add a `service_spec.py` file within the connector's directory. This file will define the components needed for ingestion, such as metadata sources, lineage sources, profilers, and samplers.

#### 2. Use the `DefaultDatabaseSpec` Class
The `DefaultDatabaseSpec` class simplifies the definition of connectors by bundling the required components. Import the `DefaultDatabaseSpec` and reference the appropriate classes for your connector.

#### 3. Define the `ServiceSpec`
Customize the `ServiceSpec` object based on the features of your connector. Below is an example configuration:

```python
from metadata.ingestion.source.database.bigquery.lineage import BigqueryLineageSource
from metadata.ingestion.source.database.bigquery.metadata import BigquerySource
from metadata.ingestion.source.database.bigquery.profiler.profiler import (
BigQueryProfiler,
)
from metadata.ingestion.source.database.bigquery.usage import BigqueryUsageSource
from metadata.sampler.sqlalchemy.bigquery.sampler import BigQuerySampler
from metadata.utils.service_spec.default import DefaultDatabaseSpec

ServiceSpec = DefaultDatabaseSpec(
metadata_source_class=BigquerySource,
lineage_source_class=BigqueryLineageSource,
usage_source_class=BigqueryUsageSource,
profiler_class=BigQueryProfiler,
sampler_class=BigQuerySampler,
)
```

#### 4. Adjust Classes for Your Connector

Replace the example classes (e.g., `BigquerySource`, `BigqueryLineageSource`, etc.) with those specific to your connector. Depending on the connector's features, you may include or exclude certain components like usage or profiling.

---

### Components of `service_spec.py`

- **`metadata_source_class`**: Defines the class for metadata ingestion.
- **`lineage_source_class`**: Defines the class for lineage extraction.
- **`usage_source_class`**: Tracks data usage patterns.
- **`profiler_class`**: Profiles data for quality and insights.
- **`sampler_class`**: Samples data for efficient ingestion.

---

### Example Workflow

#### Step 1: Add `service_spec.py`
Place the file in the connector’s directory.

#### Step 2: Configure Components
Define the `ServiceSpec` using the required classes, adjusting for your connector’s capabilities.

#### Step 3: Verify Integration
Run the ingestion workflow to test the connector and ensure all components are functioning correctly.


## Service Topology

The Topology defines a series of Nodes and Stages that get executed in a hierarchical way and describe how we extract the needed data from the sources.
Expand Down

0 comments on commit 82f8554

Please sign in to comment.