Threshold argument for Test Severity #2219

sanjanasen96 · 2020-03-19T16:17:34Z

Describe the feature

A threshold argument for test severity.
Useful for more flexibility around test severity. Eg: If the test fails for < 10 IDs, severity can be set to a warning. If the test fails for > 10 IDs, severity can be set to an error.

Describe alternatives you've considered

Currently we've been keeping an eye on daily test failures and switching severity around for tests as needed. This is not scalable or efficient.

Additional context

This is not database specific.

Who will this benefit?

We've found multiple use cases in relationship tests in multi-layer modeling like event modeling.

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2020-03-19T22:02:38Z

Worth saying off the bat:

Everything discussed in this issue and the comment below can be done today with custom schema tests that override the builtin unique, not_null, etc
It would also be really cool if the builtin tests included some sane functionality that points in this direction

I think that becomes all the more relevant as we start to work with bigger datasets. We already run into issues with tiny failures for relationships tests because tables are loaded asynchronously, but it's also important for handling cases where, e.g., a billion-row table has 10 duplicate records.

One complication I'd throw out is absolute vs. relative thresholds. I may want a test to error if the number of failures is >1% of the overall row count, instead of being greater than an arbitrary number.

While not directly related, I also see this working in combination with a where (subset/partition) argument. E.g. I only want to test data from yesterday + today, and I want it to warn or error if there are <=1% or >1% nulls, respectively. Maybe that looks like:

models:
  - name: my_big_model
    columns:
      - name: important_column
        tests:
          - not_null:
              warn_after: {count: 0, percent: true}
              error_after: {count: 1, percent: true}
              where: "date_day >= (current_date() - interval 1 day)"

drewbanin · 2020-04-09T19:14:18Z

Check out #1173 - I think we could make a change like this in the scope of that issue (or shortly thereafter).

I like the idea of the where predicate here -- TBD if that's built-in to dbt, or if we require schema test authors to support it in their test definitions though. I do imagine we could filter the referenced model in a subquery when interpolating the table reference, for example :)
I imagine we could calculate percentages pretty easily if we also run a count(*) on the specified model
I like warn_after and error_after more than I like severity: true -- it seems a lot more flexible!

sean-rose · 2020-04-22T22:21:34Z

I'm also very much in favor of supporting relative thresholds, but for the syntax instead of {count: <number>, percent: true} I'd suggest just having a number followed by a %, and if there is no % treat it as a row count.

- not_null:
    error_after: 10  # Row count

- not_null:
    error_after: 10%  # Relative percentage

jtcohen6 · 2021-04-13T02:44:08Z

Let's do it! I think these will be new test configs, warn_if + error_if, with logic for detecting % as @sean-rose recommended.

Closing in favor of the concrete proposal in #3258

RoyalTS · 2022-05-20T06:48:59Z

Sounds like the proposal in #3258 never ended up implementing relative thresholds?

jtcohen6 · 2022-05-20T08:45:16Z

@RoyalTS We're working on this right now! Check out the discussion in #4723, and the WIP in #5172

gracie2339 · 2024-04-24T20:35:15Z

Hi All, I found a workaround with this by writing a generic test in dbt and then referencing it in the yml files like any other ordinary test:

This one works for a relationships test, has a 1% error threshold, and works for BigQuery. Feel free to change as you see fit.

{% test relationships_with_1_pct_error_threshold(model, column_name, field, to) %}

with parent as (
    select
        {{ field }} as id
    from {{ to }}
),

child as (
    select 
        {{ column_name }} as id
    from {{ model }}
),

error_threshold as (
    select
        cast(count(distinct id) * .01 as int) as num
    from parent
)

select distinct id
from child 
where id is not null
    and id not in (select id from parent)
    and (
        select count(distinct id) 
        from child
        where id is not null
            and id not in (select id from parent)
        ) > (select num from error_threshold)

{% endtest %}

sanjanasen96 added enhancement New feature or request triage and removed enhancement New feature or request triage labels Mar 19, 2020

sanjanasen96 mentioned this issue Mar 19, 2020

changes test errors to warnings dbt-labs/snowplow#80

Merged

jtcohen6 added this to the Oh-Twenty [TBD] milestone Nov 13, 2020

jtcohen6 added the dbt tests Issues related to built-in dbt testing functionality label Dec 31, 2020

jtcohen6 mentioned this issue Jan 4, 2021

Expect Failure Mode for Tests #2982

Closed

jtcohen6 mentioned this issue Mar 10, 2021

[Q1C2] More consistent, configurable tests #3066

Closed

jtcohen6 mentioned this issue Apr 13, 2021

Net-new test configs #3258

Closed

jtcohen6 closed this as completed Apr 13, 2021

jtcohen6 removed this from the Margaret Mead milestone Apr 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threshold argument for Test Severity #2219

Threshold argument for Test Severity #2219

sanjanasen96 commented Mar 19, 2020

jtcohen6 commented Mar 19, 2020 •

edited

Loading

drewbanin commented Apr 9, 2020

sean-rose commented Apr 22, 2020

jtcohen6 commented Apr 13, 2021

RoyalTS commented May 20, 2022

jtcohen6 commented May 20, 2022

gracie2339 commented Apr 24, 2024

Threshold argument for Test Severity #2219

Threshold argument for Test Severity #2219

Comments

sanjanasen96 commented Mar 19, 2020

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

jtcohen6 commented Mar 19, 2020 • edited Loading

drewbanin commented Apr 9, 2020

sean-rose commented Apr 22, 2020

jtcohen6 commented Apr 13, 2021

RoyalTS commented May 20, 2022

jtcohen6 commented May 20, 2022

gracie2339 commented Apr 24, 2024

jtcohen6 commented Mar 19, 2020 •

edited

Loading