-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Docs: Updating datalake & dbt Cloud docs (#17983)
Co-authored-by: Prajwal Pandit <prajwalpandit@Prajwals-MacBook-Air.local>
- Loading branch information
1 parent
8dd6a84
commit 30a091b
Showing
49 changed files
with
1,307 additions
and
815 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
openmetadata-docs/content/partials/v1.5/connectors/storage/connectors-list.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
openmetadata-docs/content/partials/v1.6/connectors/storage/connectors-list.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
83 changes: 83 additions & 0 deletions
83
openmetadata-docs/content/v1.5.x/connectors/database/adls-datalake/index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
--- | ||
title: ADLS Datalake | ||
slug: /connectors/database/adls-datalake | ||
--- | ||
|
||
{% connectorDetailsHeader | ||
name="ADLS Datalake" | ||
stage="PROD" | ||
platform="OpenMetadata" | ||
availableFeatures=["Metadata", "Data Profiler", "Data Quality"] | ||
unavailableFeatures=["Query Usage", "Lineage", "Column-level Lineage", "Owners", "dbt", "Tags", "Stored Procedures"] | ||
/ %} | ||
|
||
In this section, we provide guides and references to use the ADLS Datalake connector. | ||
|
||
Configure and schedule Datalake metadata and profiler workflows from the OpenMetadata UI: | ||
- [Requirements](#requirements) | ||
- [Metadata Ingestion](#metadata-ingestion) | ||
- [Data Profiler](/how-to-guides/data-quality-observability/profiler/workflow) | ||
- [Data Quality](/how-to-guides/data-quality-observability/quality) | ||
|
||
{% partial file="/v1.5/connectors/ingestion-modes-tiles.md" variables={yamlPath: "/connectors/database/adls-datalake/yaml"} /%} | ||
|
||
## Requirements | ||
|
||
{% note %} | ||
The ADLS Datalake connector supports extracting metadata from file types `JSON`, `CSV`, `TSV` & `Parquet`. | ||
{% /note %} | ||
|
||
### ADLS Permissions | ||
|
||
To extract metadata from Azure ADLS (Storage Account - StorageV2), you will need an **App Registration** with the following | ||
permissions on the Storage Account: | ||
- Storage Blob Data Contributor | ||
- Storage Queue Data Contributor | ||
|
||
## Metadata Ingestion | ||
|
||
{% partial | ||
file="/v1.5/connectors/metadata-ingestion-ui.md" | ||
variables={ | ||
connector: "Datalake", | ||
selectServicePath: "/images/v1.5/connectors/datalake/select-service.png", | ||
addNewServicePath: "/images/v1.5/connectors/datalake/add-new-service.png", | ||
serviceConnectionPath: "/images/v1.5/connectors/datalake/service-connection.png", | ||
} | ||
/%} | ||
|
||
{% stepsContainer %} | ||
{% extraContent parentTagName="stepsContainer" %} | ||
|
||
#### Connection Details for Azure | ||
|
||
- **Azure Credentials** | ||
|
||
- **Client ID** : Client ID of the data storage account | ||
- **Client Secret** : Client Secret of the account | ||
- **Tenant ID** : Tenant ID under which the data storage account falls | ||
- **Account Name** : Account Name of the data Storage | ||
|
||
- **Required Roles** | ||
|
||
Please make sure the following roles associated with the data storage account. | ||
- `Storage Blob Data Contributor` | ||
- `Storage Queue Data Contributor` | ||
|
||
The current approach for authentication is based on `app registration`, reach out to us on [slack](https://slack.open-metadata.org/) if you find the need for another auth system | ||
|
||
{% partial file="/v1.5/connectors/database/advanced-configuration.md" /%} | ||
|
||
{% /extraContent %} | ||
|
||
{% partial file="/v1.5/connectors/test-connection.md" /%} | ||
|
||
{% partial file="/v1.5/connectors/database/configure-ingestion.md" /%} | ||
|
||
{% partial file="/v1.5/connectors/ingestion-schedule-and-deploy.md" /%} | ||
|
||
{% /stepsContainer %} | ||
|
||
{% partial file="/v1.5/connectors/troubleshooting.md" /%} | ||
|
||
{% partial file="/v1.5/connectors/database/related.md" /%} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
114 changes: 114 additions & 0 deletions
114
openmetadata-docs/content/v1.5.x/connectors/database/adls-datalake/yaml.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
--- | ||
title: Run the ADLS Datalake Connector Externally | ||
slug: /connectors/database/adls-datalake/yaml | ||
--- | ||
|
||
{% connectorDetailsHeader | ||
name="ADLS Datalake" | ||
stage="PROD" | ||
platform="OpenMetadata" | ||
availableFeatures=["Metadata", "Data Profiler", "Data Quality"] | ||
unavailableFeatures=["Query Usage", "Lineage", "Column-level Lineage", "Owners", "dbt", "Tags", "Stored Procedures"] | ||
/ %} | ||
|
||
In this section, we provide guides and references to use the ADLS Datalake connector. | ||
|
||
Configure and schedule ADLS Datalake metadata and profiler workflows from the OpenMetadata UI: | ||
- [Requirements](#requirements) | ||
- [Metadata Ingestion](#metadata-ingestion) | ||
- [dbt Integration](#dbt-integration) | ||
|
||
{% partial file="/v1.5/connectors/external-ingestion-deployment.md" /%} | ||
|
||
## Requirements | ||
|
||
**Note:** ADLS Datalake connector supports extracting metadata from file types `JSON`, `CSV`, `TSV` & `Parquet`. | ||
|
||
### ADLS Permissions | ||
|
||
To extract metadata from Azure ADLS (Storage Account - StorageV2), you will need an **App Registration** with the following | ||
permissions on the Storage Account: | ||
- Storage Blob Data Contributor | ||
- Storage Queue Data Contributor | ||
|
||
### Python Requirements | ||
|
||
{% partial file="/v1.5/connectors/python-requirements.md" /%} | ||
|
||
#### Azure installation | ||
|
||
```bash | ||
pip3 install "openmetadata-ingestion[datalake-azure]" | ||
``` | ||
|
||
## Metadata Ingestion | ||
All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Datalake. | ||
|
||
In order to create and run a Metadata Ingestion workflow, we will follow the steps to create a YAML configuration able to connect to the source, process the Entities if needed, and reach the OpenMetadata server. | ||
|
||
The workflow is modeled around the following JSON Schema. | ||
|
||
## 1. Define the YAML Config | ||
|
||
### This is a sample config for Datalake using Azure: | ||
|
||
{% codePreview %} | ||
|
||
{% codeInfoContainer %} | ||
|
||
#### Source Configuration - Service Connection | ||
|
||
{% codeInfo srNumber=9 %} | ||
|
||
- **Client ID** : Client ID of the data storage account | ||
- **Client Secret** : Client Secret of the account | ||
- **Tenant ID** : Tenant ID under which the data storage account falls | ||
- **Account Name** : Account Name of the data Storage | ||
|
||
{% /codeInfo %} | ||
|
||
|
||
{% partial file="/v1.5/connectors/yaml/database/source-config-def.md" /%} | ||
|
||
{% partial file="/v1.5/connectors/yaml/ingestion-sink-def.md" /%} | ||
|
||
{% partial file="/v1.5/connectors/yaml/workflow-config-def.md" /%} | ||
|
||
{% /codeInfoContainer %} | ||
|
||
{% codeBlock fileName="filename.yaml" %} | ||
|
||
```yaml {% isCodeBlock=true %} | ||
# Datalake with Azure | ||
source: | ||
type: datalake | ||
serviceName: local_datalake | ||
serviceConnection: | ||
config: | ||
type: Datalake | ||
configSource: | ||
``` | ||
```yaml {% srNumber=9 %} | ||
securityConfig: | ||
clientId: client-id | ||
clientSecret: client-secret | ||
tenantId: tenant-id | ||
accountName: account-name | ||
prefix: prefix | ||
``` | ||
{% partial file="/v1.5/connectors/yaml/database/source-config.md" /%} | ||
{% partial file="/v1.5/connectors/yaml/ingestion-sink.md" /%} | ||
{% partial file="/v1.5/connectors/yaml/workflow-config.md" /%} | ||
{% /codeBlock %} | ||
{% /codePreview %} | ||
{% partial file="/v1.5/connectors/yaml/ingestion-cli.md" /%} | ||
## dbt Integration | ||
You can learn more about how to ingest dbt models' definitions and their lineage [here](/connectors/ingestion/workflows/dbt). |
Oops, something went wrong.