-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-228] [Feature] Allow tests to warn/fail based on percentage #4723
Comments
If this is not already earmarked for development, I'd be interested (with some guidance) in contributing this. |
@jtcohen6 is this is something I could take on, do you think? |
Hey @jaypeedevlin sorry for getting back to you late! We haven't scheduled when to develop this yet. Thanks for the interest!! Your contribution is greatly welcomed! I can work with you if you have any questions. |
@jaypeedevlin I'd be very excited to have your help here! The tricky piece is finding a way to get the total number of records (with select
{{ fail_calc }} as failures,
({{ fail_calc }} / (select count(*) from {{ model }}) * 100) > 10 as should_warn,
{{ fail_calc }} != 0 as should_error
from (
{{ main_sql }}
{{ "limit " ~ limit if limit != none }}
) dbt_internal_test The challenges I ran into when trying to do this in the past:
It's A Known Thing that we need a better way to store, for each test node, the model the test is defined on, so that it can be accessible in lots of other places—including in the test materialization. Right now it's just based on its dependency. I'd be open to Language team's input on the best way to store that. |
Hey all - I was looking to do this, and saw that there's a WIP at #5172 based on statically checking for a literal percent sign and requerying the raw model to compute the percentage when that's the case. However my first instinct when I envisioned this feature was that Would there be a straightforward way to refactor default dbt test queries so that:
? |
Hmm though that breaks backcompat -- maybe add a test config called |
@vergenzt If I understand you right, you're thinking that the select * from {{ model }}
where {{ column_name }} is null To: select
case when {{ column_name }} is null then true else false end as dbt_test_failed
from {{ model }} And the In order to calculate percentages, the That's a neat idea! I think the SQL may be a bit trickier to write for select * from (
{{ test_query }}
)
where dbt_test_failed = true @jaypeedevlin @ehmartens Curious to hear your thoughts here as well! |
@jtcohen6 yep that's what I'm thinking! Though the example new SQL could be even simpler: If there were a way to isolate tests' condition expressions then this could be even more generic: {%- set test_condition_expr = ... %} {#- assume condition sql is in scope; maybe via macro arg somehow? 🤷 #}
...
{%- if config.get('fail_calc_includes_passing_rows') %}
select *, ({{ test_condition_expr }}) as dbt_test_failed from {{ model }}
{%- else %}
select * from {{ model }} where ({{ test_condition_expr }})
{%- endif %} Not sure how we'd get access to that expression in a backwards-compatible way though, given it looks like tests are currently written as full SQL select statements. 🤷 Also select
{{ column_name }} as unique_field,
- count(*) as n_records
+ count(*) as n_records,
+ n_records > 1 as dbt_test_failed
from {{ model }}
where {{ column_name }} is not null
group by {{ column_name }}
-having count(*) > 1 |
Heh, I opted for verbose-and-clear over nice-and-concise, but I agree that I like yours better :) Another way toward backwards compatibility: as each generic test converts its SQL, it reconfigures its {% macro default__test_unique(model, column_name) %}
{{ config(fail_calc = 'count(dbt_test_failed)') }}
select
{{ column_name }} as unique_field,
count(*) as n_records,
n_records > 1 as dbt_test_failed
from {{ model }}
where {{ column_name }} is not null
group by {{ column_name }}
{% endmacro %} At some point (dbt v2), we reset the default I have a conjecture that it would be possible to do this in a backwards-compatible way, without requiring changes to the many many custom generic tests already out in the world, which this issue is too narrow to contain. I'll leave it an exercise to someone else, for now, to find a way to write some even-cleverer SQL, perhaps using (gulp) a correlated subquery. Doesn't mean that's the way we ought to go. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Still interested in this! (Just commenting to dismiss Stalebot) |
I'm curious if there's been any updates on this functionality? |
This would be very useful is it still be worked on? |
Agreed that this would be very useful! |
Is this still being worked on? Would be a great feature to have. |
I abandoned work on this, unfortunately |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Still think this would be a very convenient feature |
Bumping the thread, because this would be extremely useful. |
Bumping again, would be a great feature to have! |
Hi All, I found a workaround with this by writing a generic test in dbt and then referencing it in the yml files like any other ordinary test: This one works for a relationships test, has a 1% error threshold, and works for BigQuery. Feel free to change as you see fit.
|
Adding to the bump - we've custom implemented a generic "percent-based" test macro in our shop, but having this supported by the language (or giving us additional hooks/a more standard generic interface!) would also do wonders |
Bumping again, hopefully someone is working on it |
Also bumping, it's very frustrating to have to write a custom test when we need the error threshold to be based on a percentage of the total number of rows in the model. This seems like something that should be baked-in. |
Bumping! Does anyone have updates on this? |
Bumping this - would appreciate this feature existing! |
Is there an existing feature request for this?
Describe the Feature
Add ability to support warn/fail threshold being a percentage instead of just fixed number. We will likely need to add the definition of the total number to calculate percentage with. Original request #4334
Describe alternatives you've considered
No response
Who will this benefit?
No response
Are you interested in contributing this feature?
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: