How to limit data in development #1334
Replies: 9 comments 1 reply
-
@dave-connors-3 @lbenezra-FA -- would one of you, when you have a sec, mind popping a standard issue dbt Labs |
Beta Was this translation helpful? Give feedback.
-
@jeremyyeo would you be interested in writing this one? you had great code snippets in the thread that inspired this, and i bet you've got more than a few relevant Gists! this would be a really great tightly scoped Guide that we could get live quickly if you're interested -- a perfect one to take out of Discussion land and into the site. |
Beta Was this translation helpful? Give feedback.
-
The simplest implementation of this that I've seen looks something like: # in macros/limit_data_in_dev.sql
{% macro limit_data_in_dev(column_name='created_at') %}
{% if target.name == 'dev' %}
and {{ column_name }} >= current_date() - {{ var('development_data_window') }}
{% endif %}
{% endmacro %} # in dbt_project.yml
vars:
development_data_window: 3 -- in models/marts/int/int_nice_work_winnie.sql
select
*
from {{ ref('stg_neat_data') }}
where
1=1
{{ limit_data_in_dev() }} |
Beta Was this translation helpful? Give feedback.
-
I have definitely also seen a pretty common pattern that is specific to incremental models (which probably are the models that make the most sense to limit data in) to make sure that data is limited in dev even on --full-refresh runs: (lots of different ways to write this) {% macro incremental_filter(column_name='created_at') %}
{% if is_incremental() %}
{{ column_name }} >= (select max({{ column_name }}) from {{ this }})
{% elif target.name == 'dev' %}
{{ column_name }} >= current_date() - {{ var('development_data_window') }}
{% endif %}
{% endmacro %} |
Beta Was this translation helpful? Give feedback.
-
💯 ty dave this is exactly what i was looking for -- i like presenting both options and people can choose their own level of complexity. personally i like having the incremental clause in there because it makes it simple to just throw it on everything without thinking about it as much. that said -- would also love to get some discussion going about whether it's best to limit in end models or just apply all your dev limits in the staging layer so that downstream you don't have to think about it. |
Beta Was this translation helpful? Give feedback.
-
Just a thought: some advanced discussion here might be around how you could actually weave this logic into materialization overrides to ensure that it happens automatically, without a user having to remember to use the macro |
Beta Was this translation helpful? Give feedback.
-
I wrote this 5 months back about environment variables and slim ci limiting data: https://www.notion.so/dbtlabs/Slim-CI-env_vars-5d5d809918ba4892b70fb73742a424c5 The concept is the same, just different target (CI instead of Dev) |
Beta Was this translation helpful? Give feedback.
-
alrightttt - Guides are live, it is time to whip this bb into a Guiderino! |
Beta Was this translation helpful? Give feedback.
-
Is there a way to do this without having to go into every one of your staging models and adding a macro? Somewhere in dbt Cloud we have functionality built out where 'limit 500' is thrown onto every preview. Is there something we can do like that where the target.name affects every model without having to be directly referenced? |
Beta Was this translation helpful? Give feedback.
-
Let's create a guide of best practices for limiting data in dev to speed up development and minimize costs.
The primary component should be:
limit_in_dev
macrosA reach concern would be:
Beta Was this translation helpful? Give feedback.
All reactions