Skip to content

Commit

Permalink
Docs: Updating datalake & dbt Cloud docs (#17983)
Browse files Browse the repository at this point in the history
Co-authored-by: Prajwal Pandit <prajwalpandit@Prajwals-MacBook-Air.local>
  • Loading branch information
Prajwal214 and Prajwal Pandit authored Sep 25, 2024
1 parent 8dd6a84 commit 30a091b
Show file tree
Hide file tree
Showing 49 changed files with 1,307 additions and 815 deletions.
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
{% connectorsListContainer %}

{% connectorInfoCard name="ADLS Datalake" stage="PROD" href="/connectors/database/adls-datalake" platform="OpenMetadata" / %}
{% connectorInfoCard name="Athena" stage="PROD" href="/connectors/database/athena" platform="OpenMetadata" / %}
{% connectorInfoCard name="AzureSQL" stage="PROD" href="/connectors/database/azuresql" platform="OpenMetadata" / %}
{% connectorInfoCard name="BigQuery" stage="PROD" href="/connectors/database/bigquery" platform="OpenMetadata" / %}
{% connectorInfoCard name="BigTable" stage="BETA" href="/connectors/database/bigtable" platform="OpenMetadata" / %}
{% connectorInfoCard name="Clickhouse" stage="PROD" href="/connectors/database/clickhouse" platform="OpenMetadata" / %}
{% connectorInfoCard name="Couchbase" stage="BETA" href="/connectors/database/couchbase" platform="OpenMetadata" / %}
{% connectorInfoCard name="Datalake" stage="PROD" href="/connectors/database/datalake" platform="OpenMetadata" / %}
{% connectorInfoCard name="Databricks" stage="PROD" href="/connectors/database/databricks" platform="OpenMetadata" / %}
{% connectorInfoCard name="DB2" stage="PROD" href="/connectors/database/db2" platform="OpenMetadata" / %}
{% connectorInfoCard name="Delta Lake" stage="PROD" href="/connectors/database/deltalake" platform="OpenMetadata" / %}
{% connectorInfoCard name="Domo" stage="PROD" href="/connectors/database/domo-database" platform="OpenMetadata" / %}
{% connectorInfoCard name="Doris" stage="PROD" href="/connectors/database/doris" platform="OpenMetadata" / %}
{% connectorInfoCard name="Druid" stage="PROD" href="/connectors/database/druid" platform="OpenMetadata" / %}
{% connectorInfoCard name="DynamoDB" stage="PROD" href="/connectors/database/dynamodb" platform="OpenMetadata" / %}
{% connectorInfoCard name="GCS Datalake" stage="PROD" href="/connectors/database/gcs-datalake" platform="OpenMetadata" / %}
{% connectorInfoCard name="Glue" stage="PROD" href="/connectors/database/glue" platform="OpenMetadata" / %}
{% connectorInfoCard name="Greenplum" stage="BETA" href="/connectors/database/greenplum" platform="OpenMetadata" / %}
{% connectorInfoCard name="Hive" stage="PROD" href="/connectors/database/hive" platform="OpenMetadata" / %}
Expand All @@ -34,6 +35,7 @@
{% connectorInfoCard name="SingleStore" stage="PROD" href="/connectors/database/singlestore" platform="OpenMetadata" / %}
{% connectorInfoCard name="Snowflake" stage="PROD" href="/connectors/database/snowflake" platform="OpenMetadata" / %}
{% connectorInfoCard name="SQLite" stage="PROD" href="/connectors/database/sqlite" platform="OpenMetadata" / %}
{% connectorInfoCard name="S3 Datalake" stage="PROD" href="/connectors/database/s3-datalake" platform="OpenMetadata" / %}
{% connectorInfoCard name="Teradata" stage="PROD" href="/connectors/database/teradata" platform="OpenMetadata" / %}
{% connectorInfoCard name="Trino" stage="PROD" href="/connectors/database/trino" platform="OpenMetadata" / %}
{% connectorInfoCard name="Unity Catalog" stage="PROD" href="/connectors/database/unity-catalog" platform="OpenMetadata" / %}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{% connectorsListContainer %}

{% connectorInfoCard name="S3" stage="PROD" href="/connectors/storage/s3" platform="OpenMetadata" / %}
{% connectorInfoCard name="S3 Storage" stage="PROD" href="/connectors/storage/s3" platform="OpenMetadata" / %}
{% connectorInfoCard name="ADLS" stage="PROD" href="/connectors/storage/adls" platform="Collate" / %}
{% connectorInfoCard name="GCS" stage="PROD" href="/connectors/storage/gcs" platform="Collate" / %}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
{% connectorsListContainer %}

{% connectorInfoCard name="ADLS Datalake" stage="PROD" href="/connectors/database/adls-datalake" platform="OpenMetadata" / %}
{% connectorInfoCard name="Athena" stage="PROD" href="/connectors/database/athena" platform="OpenMetadata" / %}
{% connectorInfoCard name="AzureSQL" stage="PROD" href="/connectors/database/azuresql" platform="OpenMetadata" / %}
{% connectorInfoCard name="BigQuery" stage="PROD" href="/connectors/database/bigquery" platform="OpenMetadata" / %}
{% connectorInfoCard name="BigTable" stage="BETA" href="/connectors/database/bigtable" platform="OpenMetadata" / %}
{% connectorInfoCard name="Clickhouse" stage="PROD" href="/connectors/database/clickhouse" platform="OpenMetadata" / %}
{% connectorInfoCard name="Couchbase" stage="BETA" href="/connectors/database/couchbase" platform="OpenMetadata" / %}
{% connectorInfoCard name="Datalake" stage="PROD" href="/connectors/database/datalake" platform="OpenMetadata" / %}
{% connectorInfoCard name="Databricks" stage="PROD" href="/connectors/database/databricks" platform="OpenMetadata" / %}
{% connectorInfoCard name="DB2" stage="PROD" href="/connectors/database/db2" platform="OpenMetadata" / %}
{% connectorInfoCard name="Delta Lake" stage="PROD" href="/connectors/database/deltalake" platform="OpenMetadata" / %}
{% connectorInfoCard name="Domo" stage="PROD" href="/connectors/database/domo-database" platform="OpenMetadata" / %}
{% connectorInfoCard name="Doris" stage="PROD" href="/connectors/database/doris" platform="OpenMetadata" / %}
{% connectorInfoCard name="Druid" stage="PROD" href="/connectors/database/druid" platform="OpenMetadata" / %}
{% connectorInfoCard name="DynamoDB" stage="PROD" href="/connectors/database/dynamodb" platform="OpenMetadata" / %}
{% connectorInfoCard name="GCS Datalake" stage="PROD" href="/connectors/database/gcs-datalake" platform="OpenMetadata" / %}
{% connectorInfoCard name="Glue" stage="PROD" href="/connectors/database/glue" platform="OpenMetadata" / %}
{% connectorInfoCard name="Greenplum" stage="BETA" href="/connectors/database/greenplum" platform="OpenMetadata" / %}
{% connectorInfoCard name="Hive" stage="PROD" href="/connectors/database/hive" platform="OpenMetadata" / %}
Expand All @@ -34,6 +35,7 @@
{% connectorInfoCard name="SingleStore" stage="PROD" href="/connectors/database/singlestore" platform="OpenMetadata" / %}
{% connectorInfoCard name="Snowflake" stage="PROD" href="/connectors/database/snowflake" platform="OpenMetadata" / %}
{% connectorInfoCard name="SQLite" stage="PROD" href="/connectors/database/sqlite" platform="OpenMetadata" / %}
{% connectorInfoCard name="S3 Datalake" stage="PROD" href="/connectors/database/s3-datalake" platform="OpenMetadata" / %}
{% connectorInfoCard name="Teradata" stage="PROD" href="/connectors/database/teradata" platform="OpenMetadata" / %}
{% connectorInfoCard name="Trino" stage="PROD" href="/connectors/database/trino" platform="OpenMetadata" / %}
{% connectorInfoCard name="Unity Catalog" stage="PROD" href="/connectors/database/unity-catalog" platform="OpenMetadata" / %}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{% connectorsListContainer %}

{% connectorInfoCard name="S3" stage="PROD" href="/connectors/storage/s3" platform="OpenMetadata" / %}
{% connectorInfoCard name="S3 Storage" stage="PROD" href="/connectors/storage/s3" platform="OpenMetadata" / %}
{% connectorInfoCard name="ADLS" stage="PROD" href="/connectors/storage/adls" platform="Collate" / %}
{% connectorInfoCard name="GCS" stage="PROD" href="/connectors/storage/gcs" platform="Collate" / %}

Expand Down
30 changes: 20 additions & 10 deletions openmetadata-docs/content/v1.5.x/collate-menu.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ site_menu:

- category: Connectors / Database
url: /connectors/database
- category: Connectors / Database / ADLS Datalake
url: /connectors/database/adls-datalake
- category: Connectors / Database / ADLS Datalake / Run Externally
url: /connectors/database/adls-datalake/yaml
- category: Connectors / Database / ADLS Datalake / Troubleshooting
url: /connectors/database/adls-datalake/troubleshooting
- category: Connectors / Database / Athena
url: /connectors/database/athena
- category: Connectors / Database / Athena / Run Externally
Expand Down Expand Up @@ -68,12 +74,6 @@ site_menu:
url: /connectors/database/databricks/yaml
- category: Connectors / Database / Databricks / Troubleshooting
url: /connectors/database/databricks/troubleshooting
- category: Connectors / Database / Datalake
url: /connectors/database/datalake
- category: Connectors / Database / Datalake / Run Externally
url: /connectors/database/datalake/yaml
- category: Connectors / Database / Datalake / Troubleshooting
url: /connectors/database/datalake/troubleshooting
- category: Connectors / Database / DB2
url: /connectors/database/db2
- category: Connectors / Database / DB2 / Run Externally
Expand All @@ -100,6 +100,10 @@ site_menu:
url: /connectors/database/dynamodb
- category: Connectors / Database / DynamoDB / Run Externally
url: /connectors/database/dynamodb/yaml
- category: Connectors / Database / GCS Datalake
url: /connectors/database/gcs-datalake
- category: Connectors / Database / GCS Datalake / Run Externally
url: /connectors/database/gcs-datalake/yaml
- category: Connectors / Database / Glue
url: /connectors/database/glue
- category: Connectors / Database / Glue / Run Externally
Expand Down Expand Up @@ -194,6 +198,12 @@ site_menu:
url: /connectors/database/synapse/yaml
- category: Connectors / Database / Synapse / Troubleshooting
url: /connectors/database/synapse/troubleshooting
- category: Connectors / Database / S3 Datalake
url: /connectors/database/s3-datalake
- category: Connectors / Database / S3 Datalake / Run Externally
url: /connectors/database/s3-datalake/yaml
- category: Connectors / Database / S3 Datalake / Troubleshooting
url: /connectors/database/s3-datalake/troubleshooting
- category: Connectors / Database / Trino
url: /connectors/database/trino
- category: Connectors / Database / Trino / Run Externally
Expand Down Expand Up @@ -307,9 +317,9 @@ site_menu:
url: /connectors/pipeline/dagster
- category: Connectors / Pipeline / Dagster / Run Externally
url: /connectors/pipeline/dagster/yaml
- category: Connectors / Pipeline / DBTCloud
- category: Connectors / Pipeline / dbt Cloud
url: /connectors/pipeline/dbtcloud
- category: Connectors / Pipeline / DBTCloud / Run Externally
- category: Connectors / Pipeline / dbt Cloud / Run Externally
url: /connectors/pipeline/dbtcloud/yaml
- category: Connectors / Pipeline / KafkaConnect
url: /connectors/pipeline/kafkaconnect
Expand Down Expand Up @@ -361,9 +371,9 @@ site_menu:

- category: Connectors / Storage
url: /connectors/storage
- category: Connectors / Storage / S3
- category: Connectors / Storage / S3 Storage
url: /connectors/storage/s3
- category: Connectors / Storage / S3 / Run Externally
- category: Connectors / Storage / S3 Storage / Run Externally
url: /connectors/storage/s3/yaml
- category: Connectors / Storage / GCS
url: /connectors/storage/gcs
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
title: ADLS Datalake
slug: /connectors/database/adls-datalake
---

{% connectorDetailsHeader
name="ADLS Datalake"
stage="PROD"
platform="OpenMetadata"
availableFeatures=["Metadata", "Data Profiler", "Data Quality"]
unavailableFeatures=["Query Usage", "Lineage", "Column-level Lineage", "Owners", "dbt", "Tags", "Stored Procedures"]
/ %}

In this section, we provide guides and references to use the ADLS Datalake connector.

Configure and schedule Datalake metadata and profiler workflows from the OpenMetadata UI:
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
- [Data Profiler](/how-to-guides/data-quality-observability/profiler/workflow)
- [Data Quality](/how-to-guides/data-quality-observability/quality)

{% partial file="/v1.5/connectors/ingestion-modes-tiles.md" variables={yamlPath: "/connectors/database/adls-datalake/yaml"} /%}

## Requirements

{% note %}
The ADLS Datalake connector supports extracting metadata from file types `JSON`, `CSV`, `TSV` & `Parquet`.
{% /note %}

### ADLS Permissions

To extract metadata from Azure ADLS (Storage Account - StorageV2), you will need an **App Registration** with the following
permissions on the Storage Account:
- Storage Blob Data Contributor
- Storage Queue Data Contributor

## Metadata Ingestion

{% partial
file="/v1.5/connectors/metadata-ingestion-ui.md"
variables={
connector: "Datalake",
selectServicePath: "/images/v1.5/connectors/datalake/select-service.png",
addNewServicePath: "/images/v1.5/connectors/datalake/add-new-service.png",
serviceConnectionPath: "/images/v1.5/connectors/datalake/service-connection.png",
}
/%}

{% stepsContainer %}
{% extraContent parentTagName="stepsContainer" %}

#### Connection Details for Azure

- **Azure Credentials**

- **Client ID** : Client ID of the data storage account
- **Client Secret** : Client Secret of the account
- **Tenant ID** : Tenant ID under which the data storage account falls
- **Account Name** : Account Name of the data Storage

- **Required Roles**

Please make sure the following roles associated with the data storage account.
- `Storage Blob Data Contributor`
- `Storage Queue Data Contributor`

The current approach for authentication is based on `app registration`, reach out to us on [slack](https://slack.open-metadata.org/) if you find the need for another auth system

{% partial file="/v1.5/connectors/database/advanced-configuration.md" /%}

{% /extraContent %}

{% partial file="/v1.5/connectors/test-connection.md" /%}

{% partial file="/v1.5/connectors/database/configure-ingestion.md" /%}

{% partial file="/v1.5/connectors/ingestion-schedule-and-deploy.md" /%}

{% /stepsContainer %}

{% partial file="/v1.5/connectors/troubleshooting.md" /%}

{% partial file="/v1.5/connectors/database/related.md" /%}
Original file line number Diff line number Diff line change
@@ -1,16 +1,11 @@
---
title: Datalake Connector Troubleshooting
slug: /connectors/database/datalake/troubleshooting
title: ADLS Datalake Connector Troubleshooting
slug: /connectors/database/adls-datalake/troubleshooting
---

# Troubleshooting

Learn how to resolve the most common problems people encounter in the Datalake connector.

* **'Access Denied' error when reading from S3 bucket**

Please, ensure you have a Bucket Policy with the permissions explained in the requirement section [here](/connectors/database/datalake).

Learn how to resolve the most common problems people encounter in the ADLS Datalake connector.

#### **'Azure Datalake'** credentials details

Expand All @@ -20,13 +15,8 @@ Please, ensure you have a Bucket Policy with the permissions explained in the re
- Find and click on your application
- Select `Certificates & Secret` under `Manage` Section


{% image
src="/images/v1.5/connectors/datalake/troubleshoot-clientId.png"
alt="Configure service connection"
caption="Find Client ID" /%}





Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
---
title: Run the ADLS Datalake Connector Externally
slug: /connectors/database/adls-datalake/yaml
---

{% connectorDetailsHeader
name="ADLS Datalake"
stage="PROD"
platform="OpenMetadata"
availableFeatures=["Metadata", "Data Profiler", "Data Quality"]
unavailableFeatures=["Query Usage", "Lineage", "Column-level Lineage", "Owners", "dbt", "Tags", "Stored Procedures"]
/ %}

In this section, we provide guides and references to use the ADLS Datalake connector.

Configure and schedule ADLS Datalake metadata and profiler workflows from the OpenMetadata UI:
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
- [dbt Integration](#dbt-integration)

{% partial file="/v1.5/connectors/external-ingestion-deployment.md" /%}

## Requirements

**Note:** ADLS Datalake connector supports extracting metadata from file types `JSON`, `CSV`, `TSV` & `Parquet`.

### ADLS Permissions

To extract metadata from Azure ADLS (Storage Account - StorageV2), you will need an **App Registration** with the following
permissions on the Storage Account:
- Storage Blob Data Contributor
- Storage Queue Data Contributor

### Python Requirements

{% partial file="/v1.5/connectors/python-requirements.md" /%}

#### Azure installation

```bash
pip3 install "openmetadata-ingestion[datalake-azure]"
```

## Metadata Ingestion
All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Datalake.

In order to create and run a Metadata Ingestion workflow, we will follow the steps to create a YAML configuration able to connect to the source, process the Entities if needed, and reach the OpenMetadata server.

The workflow is modeled around the following JSON Schema.

## 1. Define the YAML Config

### This is a sample config for Datalake using Azure:

{% codePreview %}

{% codeInfoContainer %}

#### Source Configuration - Service Connection

{% codeInfo srNumber=9 %}

- **Client ID** : Client ID of the data storage account
- **Client Secret** : Client Secret of the account
- **Tenant ID** : Tenant ID under which the data storage account falls
- **Account Name** : Account Name of the data Storage

{% /codeInfo %}


{% partial file="/v1.5/connectors/yaml/database/source-config-def.md" /%}

{% partial file="/v1.5/connectors/yaml/ingestion-sink-def.md" /%}

{% partial file="/v1.5/connectors/yaml/workflow-config-def.md" /%}

{% /codeInfoContainer %}

{% codeBlock fileName="filename.yaml" %}

```yaml {% isCodeBlock=true %}
# Datalake with Azure
source:
type: datalake
serviceName: local_datalake
serviceConnection:
config:
type: Datalake
configSource:
```
```yaml {% srNumber=9 %}
securityConfig:
clientId: client-id
clientSecret: client-secret
tenantId: tenant-id
accountName: account-name
prefix: prefix
```
{% partial file="/v1.5/connectors/yaml/database/source-config.md" /%}
{% partial file="/v1.5/connectors/yaml/ingestion-sink.md" /%}
{% partial file="/v1.5/connectors/yaml/workflow-config.md" /%}
{% /codeBlock %}
{% /codePreview %}
{% partial file="/v1.5/connectors/yaml/ingestion-cli.md" /%}
## dbt Integration
You can learn more about how to ingest dbt models' definitions and their lineage [here](/connectors/ingestion/workflows/dbt).
Loading

0 comments on commit 30a091b

Please sign in to comment.