Skip to content

Multi active Satellite v0

tta-scalefree edited this page Sep 29, 2023 · 1 revision

This node creates the multi-active satellite version 0. A multi-active satellite contains multiple records that are active/valid for the same business key at a point in time.


User Configuration

In the 'Config' section on the right hand-side, expand section 'Data Vault' to modify the following configurations for the node:

Config Name Config Type Explanation
Hashkey Column columnDropdownSelector Select the parent hashkey of the satellite from the drop-down menu.
Multi-active Key Column columnSelector Select one (or multiple) multi-active key columns from the list and bring it to the right side.
Hashdiff Column columnDropdownSelector Select the hashdiff of this satellite from the drop-down menu.

System Columns

No Sys Columns are added inside a Multi-active Satellite v0 Node.


Example 1

Description

In this example we're modelling a multi-active satellite v0. The source for this satellite contains customer addresses. The table contains the business key Customer Key (CA_CustKey) and multiple addresses per customer - each address has an unique address identifier (CA_ADDRESS_ID).

Prerequisite to model the multi-active satellite: A DV stage table 'DVSTG_CUSTOMER_ADDRESS' with proper config for multi-active data. FIXME add reference to Stage.

Create the node:

MA Sat v0

  • Right-click on the stage node DVSTG_CUSTOMER_ADDRESS, select "Add node" → select "Datavault by Scalefree: Satellite v0".
  • Rename the satellite using your naming conventions. For this example, we're renaming it to "SATMV0_CUSTOMER_ADDRESS".

Sat v0 Columns

  • Open the newly created Multi-active Satellite node and remove all irrelevant columns, so that you only keep the Hash Key, Hash Diff, Load Date Timestamp and Record Source as well as relevant descriptive attributes for the satellite.

Sat v0 Column Selector

  • In the 'Config' section on the right hand-side, expand section 'Data Vault' to modify the following configurations for the node:
    • Hashkey Column: Select the hash key column of this satellite from the drop-down menu.
    • Multi-active key columns: Select one (or multiple) multi-active key columns from the list and bring it to the right side. In this example, there's only one Multi-active key column CA_ADDRESS_ID.
    • Hashdiff Column: Select the hash diff column of this satellite from the drop-down menu.

Sat v0 Graph

  • Create and run the node


Generated SQL Code of Example

Create script

  CREATE OR REPLACE TABLE
    "COALESCE_WORKSHOP"."CORE"."SATMV0_CUSTOMER_ADDRESS" (
      "HK_CUSTOMER_H" STRING,
      "LDTS" TIMESTAMP
      COMMENT 'The Load Date Timestamp (LDTS) describes when this data first arrived in the Data Warehouse.',
      "RSRC" STRING
      COMMENT 'The Record Source (RSRC) describes the source of this data.',
      "HD_CUSTOMER_ADDRESS_P_S" STRING,
      "CA_ADDRESS_ID" NUMBER (5, 0),
      "CA_NAME" VARCHAR,
      "CA_ADDRESS_STREET" VARCHAR,
      "CA_ADDRESS_POSTAL_CODE" NUMBER (5, 0),
      "CA_NATIONKEY" NUMBER (2, 0),
      "CA_MKTSEGMENT" VARIANT,
      "CA_COMMENT" NUMBER
    )

Run script

  INSERT INTO
      "COALESCE_WORKSHOP"."CORE"."SATMV0_CUSTOMER_ADDRESS"
  WITH
    latest_entries_in_sat AS (
      /* get current rows from satellite */
      SELECT
        "HK_CUSTOMER_H",
        "HD_CUSTOMER_ADDRESS_P_S"
      FROM
        "COALESCE_WORKSHOP"."CORE"."SATMV0_CUSTOMER_ADDRESS" QUALIFY ROW_NUMBER() OVER (
          PARTITION BY
            "HK_CUSTOMER_H"
          ORDER BY
            "LDTS" DESC
        ) = 1
    ),
    deduplicated_numbered_source AS (
      SELECT
        "DVSTG_CUSTOMER_ADDRESS"."HK_CUSTOMER_H" AS HK_CUSTOMER_H,
        "DVSTG_CUSTOMER_ADDRESS"."LDTS" AS LDTS,
        "DVSTG_CUSTOMER_ADDRESS"."RSRC" AS RSRC,
        "DVSTG_CUSTOMER_ADDRESS"."HD_CUSTOMER_ADDRESS_P_S" AS HD_CUSTOMER_ADDRESS_P_S,
        "DVSTG_CUSTOMER_ADDRESS"."CA_ADDRESS_ID" AS CA_ADDRESS_ID,
        "DVSTG_CUSTOMER_ADDRESS"."CA_NAME" AS CA_NAME,
        "DVSTG_CUSTOMER_ADDRESS"."CA_ADDRESS_STREET" AS CA_ADDRESS_STREET,
        "DVSTG_CUSTOMER_ADDRESS"."CA_ADDRESS_POSTAL_CODE" AS CA_ADDRESS_POSTAL_CODE,
        "DVSTG_CUSTOMER_ADDRESS"."CA_NATIONKEY" AS CA_NATIONKEY,
        "DVSTG_CUSTOMER_ADDRESS"."CA_MKTSEGMENT" AS CA_MKTSEGMENT,
        "DVSTG_CUSTOMER_ADDRESS"."CA_COMMENT" AS CA_COMMENT,
        ROW_NUMBER() OVER (
          PARTITION BY
            "HK_CUSTOMER_H"
          ORDER BY
            "LDTS"
        ) as rn
      FROM
        "COALESCE_WORKSHOP"."CORE"."DVSTG_CUSTOMER_ADDRESS" "DVSTG_CUSTOMER_ADDRESS"
      WHERE
        "LDTS" > (
          SELECT
            COALESCE(
              MAX("LDTS"),
              TO_TIMESTAMP('0001-01-01T00:00:01', 'YYYY-MM-DDTHH24:MI:SS')
            )
          FROM
            "COALESCE_WORKSHOP"."CORE"."SATMV0_CUSTOMER_ADDRESS"
          WHERE
            "LDTS" != TO_TIMESTAMP('8888-12-31T23:59:59', 'YYYY-MM-DDTHH24:MI:SS')
        ) QUALIFY CASE
          WHEN "HD_CUSTOMER_ADDRESS_P_S" = LAG("HD_CUSTOMER_ADDRESS_P_S") OVER (
            PARTITION BY
              "HK_CUSTOMER_H"
            ORDER BY
              "LDTS"
          ) THEN FALSE
          ELSE TRUE
        END
    )
  SELECT DISTINCT
    HK_CUSTOMER_H,
    LDTS,
    RSRC,
    HD_CUSTOMER_ADDRESS_P_S,
    CA_ADDRESS_ID,
    CA_NAME,
    CA_ADDRESS_STREET,
    CA_ADDRESS_POSTAL_CODE,
    CA_NATIONKEY,
    CA_MKTSEGMENT,
    CA_COMMENT
  FROM
    deduplicated_numbered_source
  WHERE
    NOT EXISTS (
      SELECT
        1
      FROM
        latest_entries_in_sat
      WHERE
        deduplicated_numbered_source.HD_CUSTOMER_ADDRESS_P_S = latest_entries_in_sat."HD_CUSTOMER_ADDRESS_P_S"
        AND deduplicated_numbered_source.HK_CUSTOMER_H = latest_entries_in_sat."HK_CUSTOMER_H"
        AND deduplicated_numbered_source.rn = 1
    )