Skip to content

Commit

Permalink
Merge pull request #953 from mandy-chessell/code2024
Browse files Browse the repository at this point in the history
Add description of Unity Catalog support
  • Loading branch information
mandy-chessell authored Jul 25, 2024
2 parents 8ca075a + 96b4de3 commit 9f881da
Show file tree
Hide file tree
Showing 31 changed files with 5,356 additions and 404 deletions.
2 changes: 1 addition & 1 deletion site/docs/concepts/catalog-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ hide:

# Catalog Template

A *catalog template* identifies a template designed to catalog a particular type of open metadata element. For example in the [Automated Curation OMVS](/services/omvs/automated-curation/overview) API, it links a particular *technology type* to relevant [templates](/concepts/template) for that ype of technology.
A *catalog template* relationship identifies a [template](/concepts/template) designed to catalog a particular type of open metadata element. For example in the [Automated Curation OMVS](/services/omvs/automated-curation/overview) API, it links a particular *technology type* to relevant [templates](/concepts/template) for that ype of technology.

The catalog template is implemented using the [CatalogTemplate](/types/0/0011-Managing-Referenceables) relationship.

Expand Down
4 changes: 2 additions & 2 deletions site/docs/concepts/placeholder.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Content-Type: application/json
```
Below is an outline of the default template for cataloguing a PostgreSQL Database Server. This is represented as a [SoftwareServer](/types/0/0040-Software-Servers) asset, in includes a linked [SoftwareCapability](/types/0/0042-Software-Capabilities) for the [database manager (DBMS)](/types/0/0050-Applications-and-Processes). There is also a [connection](/concepts/connection) linked from the asset to define how to create a connector to the PostgreSQL Database Server. Notice that throughout the template, there are placeholder properties:

![PostgreSQL Database Server Template](/catalog-templates/postgres-server-catalog-template.svg)
![PostgreSQL Database Server Template](/templates/postgres-server-catalog-template.svg)

Below is an example of a call to create an asset using the template:

Expand All @@ -99,7 +99,7 @@ Content-Type: application/json
The picture below shows the resulting asset elements, linked back to the elements from the template using the [SourcedFrom](/types/0/0011-Managing-Referenceables).
Notice how the `{{serverName}}` placeholder property is used in each element to create a unique qualifiedName.

![PostgreSQL Database Server Template in use](/catalog-templates/postgres-server-template-in-use.svg)
![PostgreSQL Database Server Template in use](/templates/postgres-server-template-in-use.svg)

The placeholder properties can be used to make the templates easy to use, removing much of the repetitive creation of property values. The result is a consistent set of elements for the asset.

Expand Down
15 changes: 14 additions & 1 deletion site/docs/concepts/template.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,17 @@ hide:

# Template

A *template* is a collection of linked metadata elements that can be used to provide values and structure to a newly created element.
A *template* is a definition that can be used to provide values and structure to a newly created element. It is implemented as a collection of [anchored](/concepts/anchor) elements with the [Template classification](/types/0/0011-Managing-Referenceables) attached to the anchor element. The *Template* classification provides descriptive information about the purpose and use of the template.

When a new element is created from a template, it is linked back to its template using the [SourcedFrom](/types/0/0011-Managing-Referenceables) relationship so that it is possible to trace the elements derived from a template if an update is required. Templates also have a version identifier associated with them (in the *Template* classification). When a new version of the template is created, it is also linked to the previous version using the *SourcedFrom* relationship.

The metadata elements that make up a template have [placeholder properties](/concepts/placeholder) in their attributes, values for which are supplied when the template is used. The [specification](/concepts/specification) for these placeholder properties are defined using [ValidValueDefinition](/types/5/0545-Reference-Data/) entities linked to the template's anchor element using the [SpecificationPropertyAssignment](/types/5/0545-Reference-Data/) relationship.

Templates can be attached to other elements using the [CatalogTemplate](/types/0/0011-Managing-Referenceables) relationship. This mechanism is used by [Automated Curation OMVS](/services/omvs/automated-curation/overview) when it is retrieving templates for particular technology types.


??? education "Further Information"
* [Templated Cataloguing](/features/templated-cataloguing/overview) provides examples on how the templating mechanism works.
* [Automated Curation OMVS](/services/omvs/automated-curation/overview) provides a REST API for querying the templates attached to particular technology types.
* [Template Manager OMVS](/services/omvs/template-manager/overview) provides a REST API for creating new templates.
* The [Generic Handlers](/services/generic-handlers) service takes credit for supplying the implementation if the templating mechanism. This runs in the [metadata-access-server](/concepts/metadata-access-server).
13 changes: 13 additions & 0 deletions site/docs/connectors/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,19 @@ Files provide storage for many types of data. They are organizes into folders (
* The [JDBC Resource Connector](/connectors/resource/jdbc-resource-connector) is for accessing a database via the JDBC DataSource interface.
* The [JDBC Integration Connector](/connectors/integration/jdbc-integration-connector) automatically maintains the open metadata instances on a database server via JDBC. This includes the database schemas, tables, columns, primary keys and foreign keys.

### Unity Catalog

---8<-- "snippets/systems/unity-catalog-intro.md"

The Unity Catalog connectors provide a suite of function that integrates a Unity Catalog server into the open metadata ecosystem.

The [Unity Catalog Resource Connector](/connectors/unity-catalog/resource-connector) is a [digital resource connector](/concepts/digital-resource-connector) that acts as a Java client to the Unity Catalog Server REST API. It is used by the other Unity Catalog connectors.
The [Unity Catalog Server Synchronizer](/connectors/unity-catalog/sync-server-connector) is an [integration connector](/concepts/integration-connector) that exchanges details about the catalogs defined for a Unity Catalog Server.
The [Unity Catalog Inside Catalog Synchronizer](/connectors/unity-catalog/sync-catalog-connector) is an [integration connector](/concepts/integration-connector) that exchanges details about the resources (schemas, tables, volumes and functions) defined within a Catalog found in a Unity Catalog Server.
The [Unity Catalog Server Survey Service](/connectors/unity-catalog/server-survey-sevice) is a [Survey Action Service](/concepts/survey-action-service) that surveys the resources defined in a Unity Catalog Server.
The [Unity Catalog Inside Catalog Survey Service](/connectors/unity-catalog/catalog-survey-sevice) is a [Survey Action Service](/concepts/survey-action-service) that surveys the resources defined in a Catalog inside a Unity Catalog Server.
The [Unity Catalog Inside Schema Survey Service](/connectors/unity-catalog/schema-survey-sevice) is a [Survey Action Service](/concepts/survey-action-service) that surveys the resources defined in a Schema inside a Unity Catalog Server.

### Apache Kafka

* The [Kafka Open Metadata Topic Connector](/connectors/resource/kafka-open-metadata-topic-connector) implements a [resource connector](/concepts/digital-resource-connector) for a topic that exchanges Java Objects as JSON payloads across an [Apache Kafka](https://kafka.apache.org/) event bus. It is configured in the Egeria [OMAG Servers](/concepts/omag-server) through the [Event Bus Configuration](/concepts/event-bus). This the connector that is used by default in the Egeria runtimes to exchange events (notifications between the [OMAG Servers](/concepts/omag-server)).
Expand Down
135 changes: 135 additions & 0 deletions site/docs/connectors/unity-catalog/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
<!-- SPDX-License-Identifier: CC-BY-4.0 -->
<!-- Copyright Contributors to the Egeria project. -->


# Unity Catalog

---8<-- "snippets/systems/unity-catalog-intro.md"

The picture below shows Unity Catalog managing access to data in [Delta Lake](https://delta.io/).

![Unity Lake with Delta Lake](unity-catalog-purpose.svg)

Internally, unity catalog's metadata is organized into catalogs. (So one way to think of Unity Catalog is as a 'catalog of catalogs'.) Each catalog has multiple schemas and these contain the resources:

* Tables - these are virtual tables, typically backed by an Apache Parquet file.
* Functions - these are callable functions, typically implemented in SQL, but may be a callable external component.
* Volumes - these are collections of files.

As a result of this structure, the resources in Unity Catalog have a three level name: *catalogName.schemaName.resourceName*.

## Unity Catalog Technology Type Names

The technology type names (aka [deployed implementation types](/concepts/deployed-implementation-type)) added to Egeria's reference data for Unity Catalog are:

* *Unity Catalog Server* - The OSS Unity Catalog (UC) Server is an operational data platform 'catalog of catalogs' that supports controlled access to data managed through a related data platforms.
* *Unity Catalog Catalog* - An operational data platform catalog within the Unity Catalog (UC) 'catalog of catalogs'.
* *Unity Catalog Schema* - A schema that organizes data assets for an operational data platform catalog within the Unity Catalog (UC) 'catalog of catalogs'.
* *Unity Catalog Table* - A relational table within the Unity Catalog (UC) 'catalog of catalogs'.
* *Unity Catalog Function* - A function found in Unity Catalog (UC) that is working with data.
* *Unity Catalog Volume* - A collection of related data files within the Unity Catalog (UC) 'catalog of catalogs'.

??? example "JSON output from tech type search for 'Unity Catalog'"
```json
{
"class": "TechnologyTypeSummaryListResponse",
"relatedHTTPCode": 200,
"elements": [
{
"technologyTypeGUID": "2d89345f-2650-4c04-bd5c-8cdbab7a0b79",
"qualifiedName": "Egeria:ValidMetadataValue:SoftwareServer:deployedImplementationType-(Unity Catalog Server)",
"name": "Unity Catalog Server",
"description": "The OSS Unity Catalog (UC) Server is an operational data platform 'catalog of catalogs' that supports controlled access to data managed through a related data platforms.",
"category": "SoftwareServer:deployedImplementationType"
},
{
"technologyTypeGUID": "2b28dd27-3d4e-4c75-a3e8-cbbcbe8cb62f",
"qualifiedName": "Egeria:ValidMetadataValue:Catalog:deployedImplementationType-(Unity Catalog Catalog)",
"name": "Unity Catalog Catalog",
"description": "An operational data platform catalog within the Unity Catalog (UC) 'catalog of catalogs'.",
"category": "Catalog:deployedImplementationType"
},
{
"technologyTypeGUID": "c56ca4d1-ed5a-4b05-b75b-e4b6bd3500ff",
"qualifiedName": "Egeria:ValidMetadataValue:DeployedDatabaseSchema:deployedImplementationType-(Unity Catalog Schema)",
"name": "Unity Catalog Schema",
"description": "A schema that organizes data assets for an operational data platform catalog within the Unity Catalog (UC) 'catalog of catalogs'.",
"category": "DeployedDatabaseSchema:deployedImplementationType"
},
{
"technologyTypeGUID": "3a1ad610-f5c5-4aba-a766-63965ac528be",
"qualifiedName": "Egeria:ValidMetadataValue:VirtualRelationalTable:deployedImplementationType-(Unity Catalog Table)",
"name": "Unity Catalog Table",
"description": "A relational table within the Unity Catalog (UC) 'catalog of catalogs'.",
"category": "VirtualRelationalTable:deployedImplementationType"
},
{
"technologyTypeGUID": "7f15dd5f-7569-4697-a3f1-491e399f4351",
"qualifiedName": "Egeria:ValidMetadataValue:DeployedAPI:deployedImplementationType-(Unity Catalog Function)",
"name": "Unity Catalog Function",
"description": "A function found in Unity Catalog (UC) that is working with data.",
"category": "DeployedAPI:deployedImplementationType"
},
{
"technologyTypeGUID": "dbabe8cb-345e-4665-a665-1bef56a26ecd",
"qualifiedName": "Egeria:ValidMetadataValue:DataFolder:deployedImplementationType-(Unity Catalog Volume)",
"name": "Unity Catalog Volume",
"description": "A collection of related data files within the Unity Catalog (UC) 'catalog of catalogs'.",
"category": "DataFolder:deployedImplementationType"
}
]
}
```

## Open Metadata Type Mapping for Unity Catalog

The mapping from Unity Catalog metadata elements to the Open Metadata Types used in the Open Metadata Ecosystem is as follows:

| Technology Type | Open Metadata Type |
|------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
| Unity Catalog Server | [SoftwareServer](/types/0/0040-Software-Servers) |
| Unity Catalog Catalog | [Catalog](/types/0/0050-Applications-and-Processes) |
| Unity Catalog Schema | [DeployedDatabaseSchema](/types/2/0224-Databases) |
| Unity Catalog Function | [DeployedAPI](/types/2/0212-Deployed-APIs) with an associated [DeployedSoftwareComponent](/types/2/0215-Software-Components) for its implementation. |
| Unity Catalog Table | [VirtualRelationalTable](/types/2/0235-Information-View) with an associated [DataFolder](/types/2/0220-Files-and-Folders) for its files. |
| Unity Catalog Volume | [DataFolder](/types/2/0220-Files-and-Folders) |

In addition, each of these elements have a [PropertyFacet](/types/0/0020-Property-Facets) and an [External Identifier](/types/0/0017-External-Identifiers) attached. The property facet contains implementation specific details; the external identifier includes the guid from unity catalog plus other mapping values such as the catalog name, schema name and short name to enable the Unity Catalog connectors to ensure that the name of a element has not changed since the last time a Unity Catalog element was retrieved.

The diagram below illustrates the mapping of the Unity Catalog metadata resource to the Open Metadata Types.

![Type Mapping](unity-catalog-type-mapping.svg)

The templates that implement this mapping are described in [Unity Catalog Templates](/templates/unity-catalog-templates).

### Anchor design for Unity Catalog

In order to have correct delete semantics, each of the unity catalog resources is its own [anchored structure](/concepts/anchor). In addition, each resource is anchored to its parent. So each table, function and volume is anchored to its schema and each schema is anchored to its catalog. The catalogs are anchored to their appropriate server.

The result is, if for example, a catalog is deleted, all the schemas, tables, functions and volumes nested underneath it are deleted too - ensuring there are no orphaned fragments of metadata left in the repository.

### Metadata Collections

Each catalog in a Unity Catalog server is assigned its own metadata collection. The schemas, tables, functions and volumes within the catalog are all part of the catalog's metadata collection making it easy to identify the origin of these metadata elements.

![Metadata Collections for Unity Catalog Resources](unity-catalog-metadata-collections.svg)

The unity connectors also use the metadata collections to scope the metadata they are processing.

## Unity Catalog Connectors

The connectors shipped with Egeria are as follows:

![Unity Catalog Connectors](unity-catalog-connectors.svg)

| Connector Name | Connector Type | Purpose |
|----------------------------------------------------------------------------------------------|--------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Unity Catalog Resource Connector](/connector/unity-catalog/resource-connector) | [Digital Resource Connector](/concepts/digital-resource-connector) | Provides wrapper around Unity Catalog's REST API. |
| [Unity Catalog Server Survey](/connector/unity-catalog/server-survey-service) | [Survey Action Service](/concepts/survey-action-service) | Surveys the contents of a Unity Catalog Server. |
| [Unity Catalog Catalog Survey](/connector/unity-catalog/catalog-survey-service) | [Survey Action Service](/concepts/survey-action-service) | Surveys the contents of a Unity Catalog Catalog. |
| [Unity Catalog Schema Survey](/connector/unity-catalog/schema-survey-service) | [Survey Action Service](/concepts/survey-action-service) | Surveys the contents of a Unity Catalog Schema. |
| [Unity Catalog Server Synchronizer](/connector/unity-catalog/sync-server-connector) | [Integration Connector](/concepts/integration-connector) | Bootstraps the cataloguing of a Unity Catalog Server by retrieving the catalogs and configuring the Inside Catalog Connector (below). |
| [Unity Catalog Inside Catalog Synchronizer](/connector/unity-catalog/sync-catalog-connector) | [Integration Connector](/concepts/integration-connector) | Synchronizes the metadata describing a Unity Catalog Server's catalogs, schemas, tables, functions and volumes between Unity Catalog and the Open Metadata Ecosystem. |


--8<-- "snippets/abbr.md"
6 changes: 6 additions & 0 deletions site/docs/connectors/unity-catalog/schema-survey-service.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<!-- SPDX-License-Identifier: CC-BY-4.0 -->
<!-- Copyright Contributors to the Egeria project. -->

# OSS Unity Catalog Schema Survey Service

Survey of a schema within an OSS Unity Catalog Server.
4 changes: 4 additions & 0 deletions site/docs/connectors/unity-catalog/unity-catalog-beans.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions site/docs/connectors/unity-catalog/unity-catalog-purpose.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 9f881da

Please sign in to comment.