GitHub - segmentio/profiles-sync-dbt

dbt model definitions for use with Twilio-Segment's Profiles Sync feature.

Profiles Sync lands _update tables, the complete historical record of your identity graph. The models in this repo are designed to help you produce your identity graph's most recent state.

This repo is tested for

Snowflake
BigQuery
Redshift

About Profiles Sync

Profiles Sync lands data non-destructively in your warehouse, giving you access to the complete history of merges, traits, and external ID associations. This allows you to monitor identity health, and reconstruct a prior state of a profile (e.g. prior to a merge).

For many practical use cases, you will want to materialize and use your identity graph's current state.

Specifically, we recommend you create:

Materialized Table	Source Table	Description
`id_graph`	`id_graph_updated`	a lookup table for identities - i.e. where you can join using `segment_id` to combine events that then resolve to a single `canonical_segment_id`
`external_id_mapping`	`external_id_mapping_updates`	a lookup table for external identifiers such as `user_id`, `email`, etc.
`profile_traits`	`profile_traits_updates`	a table of people and traits - one row per `canonical_segment_id`.
~~`profile_traits`~~	~~`identifies`~~	As of August 1st, 2023, `profile_traits` is superceded by a materlization that sources from `profile_traits_updates`

profile_traits - Columns in this table are automatically populated from the columns of its source table profile_traits_updates. As we do not delete any rows, this table also includes merged-away profiles with their trait values set to NULL. To identify an older profile, its merged_to column will be populated with its latest segment_id.

For convenience, this package will produce (or "materialize") all three of those models.

Configuration

dbt_project.yml contains 4 main configuration variables:

profiles: the name dbt parses to fetch warehouse credentials from profiles.yml. Refer dbt docs on how to configure your warehouse connection profiles profiles.yml.
schema_name: name of the Personas space/schema where the base tables land. Defaults to identified_events.
etl_overlap: frequency (in hours) of materialization runs. We recommend this number to span 1-2 previous cycles to account for different tables landing at different times. (e.g. if you build every 24 hrs, we might suggest setting this to 24*2+1=49 hrs).
materialization: means of materializing above views. Refer dbt docs: materializations for more details.
- incremental (default): most efficient use of computation, incrementally-materialized views
- table: less efficient, but also less-complicated materialized tables - each table will be rebuilt from scratch each time.
- view: (non-materialized) view definitions which can also be leveraged if you want to avoid dbt orchestration entirely - simply dbt run (locally) a single time to establish the definitions (note that you will need to re-run if new traits are instrumented, to add those traits as new columns)

Installation

You can download this package, standalone, update your configuration as per the section above, and dbt run.

You can also import this dbt package as a module in an existing project. dbt offers excellent (albeit slightly outdated) documentation on how that works:

The steps for module import are:

Specify the package URL in your packages.yml file (should be in the main directory of your dbt project; you may need to create the file if it doesn't exist) OR Clone the directory and specify the local directory where you have cloned it (since the repo isn't currently public, this may save some headache)
Copy the requisite bits of configuration from this package's dbt_profiles.yml to your project's dbt_profiles.yml. Specifically, you'll want to add the profiles_sync sections below to the vars and models sections of your own dbt_profiles.yml.

vars:
  profiles_sync:
    schema_name: profiles_v1 # replace this with the name of the schema where Profiles Sync is landing its source tables
    etl_overlap: 28 # should be set to an interval (in hours) that's a bit larger than your materialization cadence

models:
  profiles_sync:
    +on_schema_change: "append_new_columns"
    profile_materializations:
      +materialized: incremental

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
analyses		analyses
macros		macros
models/profile_materializations		models/profile_materializations
seeds		seeds
snapshots		snapshots
tests		tests
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
dbt_project.yml		dbt_project.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About Profiles Sync

Configuration

Installation

About

Releases

Packages

Contributors 3

License

segmentio/profiles-sync-dbt

Folders and files

Latest commit

History

Repository files navigation

About Profiles Sync

Configuration

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages