Skip to content
This repository has been archived by the owner on May 4, 2023. It is now read-only.
/ ddbt Public archive

Dom's Data Build Tool

License

Notifications You must be signed in to change notification settings

monzo/ddbt

Repository files navigation

This repo is now archived

It's left here for historic purposes so as to preserve the code that is within it, however no active development is continuing on ddbt. If support/discussion around re-opening the repo want to be had, please come visit us in #data-engineering-ask.

The original state of the README can be seen below.

Dom's Data Build Tool

Build Status GoDoc

This repo represents my attempt to build a fast version of DBT which gets very slow on large projects (3000+ data models). This project attempts to be a direct drop in replacement for DBT at the command line.

Warning: This is experimental and may not work exactly as you expect

Installation

  1. Clone this repo
$ git clone git@github.com:monzo/ddbt.git
  1. Change directory into cloned repo
$ cd ddbt
  1. Install (requires go-lang)
$ go install
  1. Confirm installation
$ ddbt --version
ddbt version 0.6.7

Command Quickstart

  • ddbt run will compile and execute all your models, or those filtered for, against your data warehouse
  • ddbt test will run all tests referencing all your models, or those filtered for, in your project against your data warehouse
  • ddbt show my_model will output the compiled SQL to the terminal
  • ddbt copy my_model will copy the compiled SQL into your clipboard
  • ddbt show-dag will output the order of how the models will execute
  • ddbt watch will get act like run, followed by test. DDBT will then watch your file system for any changes and automatically rerun those parts of the DAG and affected downstream tests or failing tests.
  • ddbt watch --skip-run is the same as watch, but will skip the initial run (preventing you having to wait for all the models to run) before running the tests and starting to watch your file system.
  • ddbt completion zsh will generate a shell completion script zsh (or bash if you pass that as argument). Detailed steps to set up the completion script can be found in ddbt completion --help
  • ddbt isolate-dag will create a temporary directory and symlink in all files needed for the given model_filter such that Fishtown's DBT could be run against it without having to be run against every model in your data warehouse
  • ddbt schema-gen -m my_model will output a new or updated schema yml file for the model provided in the same directory as the dbt model file.
  • ddbt lookml-gen my_model will generate lookml view and copy it to your clipboard

Global Arguments

  • --models model_filter or -m model_filter: Instead of running for every model in your project, DDBT will only execute against the requested models. See filters below for what is accepted for my_model
  • --threads=n: force DDBT to run with n threads instead of what is defined in your dbt_project.yml
  • --target=x or -t x: force DDBT to run against the x output defined in your profile.yml instead of the default defined in that file.
  • --upstream=y or -u y: For any references to models outside the explicit models specified by run or test, the upstream target used to read that data will be swapped to y instead of the output target of x
  • --fail-on-not-found=false or -f=false: By default, ddbt will fail if a the specified models don't exist, passing in this argument as false will warn instead of failing
  • --enable-schema-based-tests or -s=true: Schema-based tests are disabled by default for now, but as a way to enable them pass this argument as true
  • --custom-config-path=my/custom/path or -c=my/custom/path: Allows a custom path to be used for the dbt_project.yml. This is useful if you want to use a different location than the default one. For example if you're mid-way through migrating commands from an old dbt version to a new version and using two different versions of dbt_project.yml at the same time.

Model Filters

When running or testing the project, you may only want to run for a subset of your models.

Currently DDBT supports the following syntax options:

  • -m my_model: DDBT will only execute against the model with that name
  • -m +my_model: DDBT will run against my_model and all upstreams referenced by it
  • -m my_model+: DDBT will run against my_model and all downstreams that referenced it
  • -m +my_model+: DDBT will run against my_model and both all upstreams and downstreams.
  • -m tag:tagValue: DDBT will only execute models which have a tag which is equal to tagValue