Docs: Service Spec updation (open-metadata#18796)

Prajwal214 · Nov 27, 2024 · 82f8554 · 82f8554
1 parent 3ff11d8
commit 82f8554
Showing 1 changed file with 70 additions and 0 deletions.
diff --git a/...SHOT/developers/contribute/developing-a-new-connector/develop-ingestion-code.md b/...SHOT/developers/contribute/developing-a-new-connector/develop-ingestion-code.md
@@ -11,6 +11,76 @@ The main takes for developing a new connector are:
 - To understand that each of our Source Types (Databases, Dashboards, etc) have a Topology attached.
 - To understand that the process flow is implemented as a generator chain, going through each step.
 
+## Service Spec
+
+When developing a new database ingestion connector in OpenMetadata, ensure all necessary components are correctly configured. This guide outlines the steps required to define the connector's ingestion capabilities using a `service_spec.py` file.
+
+---
+
+### Why Use `service_spec.py`?
+
+The `service_spec.py` file centralizes the definitions of sources, profilers, lineage, and other ingestion-related components for a connector. This approach helps standardize implementations across connectors, making it easier to manage ingestion workflows.
+
+---
+
+### Steps to Develop a New Connector
+
+#### 1. Create the `service_spec.py` File
+Add a `service_spec.py` file within the connector's directory. This file will define the components needed for ingestion, such as metadata sources, lineage sources, profilers, and samplers.
+
+#### 2. Use the `DefaultDatabaseSpec` Class
+The `DefaultDatabaseSpec` class simplifies the definition of connectors by bundling the required components. Import the `DefaultDatabaseSpec` and reference the appropriate classes for your connector.
+
+#### 3. Define the `ServiceSpec`
+Customize the `ServiceSpec` object based on the features of your connector. Below is an example configuration:
+
+```python
+from metadata.ingestion.source.database.bigquery.lineage import BigqueryLineageSource
+from metadata.ingestion.source.database.bigquery.metadata import BigquerySource
+from metadata.ingestion.source.database.bigquery.profiler.profiler import (
+    BigQueryProfiler,
+)
+from metadata.ingestion.source.database.bigquery.usage import BigqueryUsageSource
+from metadata.sampler.sqlalchemy.bigquery.sampler import BigQuerySampler
+from metadata.utils.service_spec.default import DefaultDatabaseSpec
+
+ServiceSpec = DefaultDatabaseSpec(
+    metadata_source_class=BigquerySource,
+    lineage_source_class=BigqueryLineageSource,
+    usage_source_class=BigqueryUsageSource,
+    profiler_class=BigQueryProfiler,
+    sampler_class=BigQuerySampler,
+)
+```
+
+#### 4. Adjust Classes for Your Connector
+
+Replace the example classes (e.g., `BigquerySource`, `BigqueryLineageSource`, etc.) with those specific to your connector. Depending on the connector's features, you may include or exclude certain components like usage or profiling.
+
+---
+
+### Components of `service_spec.py`
+
+- **`metadata_source_class`**: Defines the class for metadata ingestion.  
+- **`lineage_source_class`**: Defines the class for lineage extraction.  
+- **`usage_source_class`**: Tracks data usage patterns.  
+- **`profiler_class`**: Profiles data for quality and insights.  
+- **`sampler_class`**: Samples data for efficient ingestion.  
+
+---
+
+### Example Workflow
+
+#### Step 1: Add `service_spec.py`
+Place the file in the connector’s directory.
+
+#### Step 2: Configure Components
+Define the `ServiceSpec` using the required classes, adjusting for your connector’s capabilities.
+
+#### Step 3: Verify Integration
+Run the ingestion workflow to test the connector and ensure all components are functioning correctly.
+
+
 ## Service Topology
 
 The Topology defines a series of Nodes and Stages that get executed in a hierarchical way and describe how we extract the needed data from the sources.