This repository is intended for comparing BigQuery
models
in dbt
that have changed during an open PR.
The repository has been published as a Github Action
and PyPi Package
, which means it can be leveraged in a variety of ways:
- Directly in Python via
run_dbt_table_diff
. - Directly in Terminal via
python3 -m dbt_table_diff
. - In a Github Workflow File via
Github Actions
to automatically add comments on Open PRs.
pip3 install dbt_table_diff
from dbt_table_diff import run_dbt_table_diff
run_dbt_table_diff(
project_id="ultimate-bit-359101",
keyfile_path="secrets/bq_keyfile.json",
manifest_file="target/manifest.json",
dev_prefix="dev_",
prod_prefix="prod_",
fallback_prefix="fb_",
custom_checks_path="",
ignored_schemas=[],
irregular_schemas=[],
org_name="org-not-included",
repo_name="dbt_example",
pr_id="2",
auth_token="my_github_pat",
)
python3 -m dbt_table_diff -t $GH_TOKEN -o org-not-included -r dbt_example -l 2 \
--manifest_file 'target/manifest.json' --project_id 'ultimate-bit-359101' \
--keyfile_path 'secrets/bq_keyfile.json' --dev_prefix 'dev_' --prod_prefix 'prod_' --fallback_prefix 'fb_'
Input Parameter | Description |
---|---|
GCP_TOKEN | for connecting to BQ (runs dbt compile and dbt_table_diff/sql_checks to compare tables) |
GH_TOKEN | for connecting to Github (ie. fetches modified models/*.sql in your PR, adds comment on your PR) |
PR_NUMBER | for fetching open PR from github (Pull Request ID [int]) |
GH_REPO | for fetching open PR from github (Repository Name) |
GH_ORG | for fetching open PR from github (Repository owner/organization name) |
DBT_PROFILE_FILE | the local path in your repo to your profile.yml for dbt (this is necessary for compiling manifest.json during setup process) |
dev_prefix | the prefix used when running dbt locally (Your source schema/environment for comparison) |
prod_prefix | the prefix used when running dbt remotely (Your target schema/environment for comparison) |
fallback_prefix | useful if you have an overriden macro for generate_schema_name in your dbt project, which leverages a different prefix for some schemas in prod. |
irregular_schemas | comma separated string of schemas which use fallback_prefix |
project_id | for connecting to BQ (BigQuery Project ID) |
ignored_schemas | comma separated string of schemas to ignore (skip checking during github action) |
custom_checks_path | A local folder containing any custom SQL checks to run. |
- Fetches list of files modified in Pull Request
- by CURLing
github.api.com/repos/{organization}/{repository}/pulls/{pull_request_id}/files
- by CURLing
- Filters on
relevant_files
- which are files matching
models/*.sql
- which are files matching
- Builds
manifest.json
- By running
dbt deps; dbt compile
- By running
- Parses
manifest.json
forrelevant_models
- using manifest-attribute
original_file_path
matchingrelevant_files
- using manifest-attribute
- Runs all SQL files in
dbt_table_diff/sql_checks
- for each of the
relevant_models
, compare the two dbt targets (dev_prefix
vsprod_prefix
)
- for each of the
- Saves output to file
- in a format supported by Github comments
- Posts comment on open PR
- leveraging
dbt_table_diff
PyPi package
- leveraging
python3 -m dbt_table_diff --help
usage: dbt_table_diff [-h] [-o ORG_NAME] [-r REPO_NAME] [-t AUTH_TOKEN] [-l PR_ID] [--manifest_file MANIFEST_FILE] [--project_id PROJECT_ID] [--keyfile_path KEYFILE_PATH] [--ignored_schemas IGNORED_SCHEMAS]
[--irregular_schemas IRREGULAR_SCHEMAS] [--dev_prefix DEV_PREFIX] [--prod_prefix PROD_PREFIX] [--fallback_prefix FALLBACK_PREFIX] [--custom_checks_path CUSTOM_CHECKS_PATH]
optional arguments:
-h, --help show this help message and exit
-o ORG_NAME, --org_name ORG_NAME
Owner of GitHub repository.
-r REPO_NAME, --repo_name REPO_NAME
Name of the GitHub repository.
-t AUTH_TOKEN, --auth_token AUTH_TOKEN
User's GitHub Personal Access Token.
-l PR_ID, --pr_id PR_ID
The issue # of the Pull Request.
--manifest_file MANIFEST_FILE
The path to dbt's manifest file.
--project_id PROJECT_ID
The BigQuery Project ID to leverage.
--keyfile_path KEYFILE_PATH
The path to the keyfile to use during BQ calls.
--ignored_schemas IGNORED_SCHEMAS
Folders in models/ to always ignore during row/col checks.
--irregular_schemas IRREGULAR_SCHEMAS
Folders in models/ which use 'fallback_prefix' in prod.
--dev_prefix DEV_PREFIX
Prefix used by development datasets in dbt.
--prod_prefix PROD_PREFIX
Prefix used by production datasets in dbt.
--fallback_prefix FALLBACK_PREFIX
Uncommon prefix used by only some production datasets in dbt.
--custom_checks_path CUSTOM_CHECKS_PATH
A local folder containing any custom SQL to run.