Planning for v1.2: "Adapter Ergonomics" #5091

jtcohen6 · 2022-04-19T10:19:15Z

jtcohen6
Apr 19, 2022
Maintainer

In our post-1.0 world, we're aiming for predictability and consistency with new dbt-core releases. That means a new minor ("feature") release every three months: https://docs.getdbt.com/docs/core-versions

Each minor release is organized around a "theme." This helps us plan our work, coordinate our efforts, and talk about the new version when it's ready. We should expect about half of the changes in a minor version to be "thematic." The other half is accoutrements: bug fixes, community contributions, performance improvements, and straightforward extensions of existing capabilities.

v1.1 is arriving very soon (end of April), and it's currently available as a release candidate. (Please try it out!) The theme for v1.1 was testing—not in the sense of dbt test, but rather in the sense of pytest test_make_sure_dbt_core_works.py. We totally rebuilt our testing harness for dbt-core and adapter plugins, we converted a bunch of our legacy tests, and we're documenting the components of the new framework, for the benefit of external adapter maintainers: https://docs.getdbt.com/docs/contributing/testing-a-new-adapter

The next minor version, v1.2, is targeted for July. We're organizing a theme around "adapter ergonomics," meaning straightforward quality-of-life improvements for how dbt interacts with and across different databases, platforms, query engines, etc. I'll aim to keep the v1.2 milestone up-to-date as we scope and tackle specific issues. For now, take the set of things described below as an illustration of what we're thinking about for v1.2.

Materializations

Materializations are the most important macros we have—they do a lot of the heavy lifting in dbt run. They also haven't gotten a lot of love in a long time. Before we can make sweeping, nonlinear improvements to how dbt materializes models/snapshots/seeds/tests, there are a handful of straightforward, linear improvements that have been staring us in the face for some time.

Initial scope

Stretch goals

Support Materialized Views #1162
Enrich materializations with more structured logging, including post-build table/column profiling
Beyond grants, support more "advanced" permission features, on available platforms: secure/authorized views, row access policy, column-level security via masking policies / policy tags

Down the line

These aren't in scope for the work we're planning for v1.2, but they're things I always have in mind:

Multi-object models: including sharded tables ("insert by period"), lambda views (Feature Request: Sharded Tables #4457)
"Dry running" full materialization SQL, taking advantage of explain or database-specific validation when available (dbt should have a Dry-Run mode #4456)
Materialization / run mode for unit-testing model SQL, with fixture inputs + outputs (Unit Testing SQL in dbt #4455)

The linear improvements in scope above will help us reexamine and strengthen the foundations of dbt materializations. It's those same foundations that will enable us to tackle these bigger-swing initiatives head-on, in the not-so-distant futur. In the meantime, if you find them compelling, please continue to weigh into each discussion, share your experimentation and findings.

I'm open to your feedback! Are there other quality-of-life improvements that we're missing here? Other long-shots that we should be thinking about, even if we're not ready to write the code for them just yet?

Cross-database macros

Today, most of the macros that are defined within dbt-core and adapter plugins are the ones needed for low-level tasks (caching, cataloging) and materializations (create_table_as, create_view_as). These are "internal" macros, used indirectly every time you type dbt run. It's rare as an end user to call them by name.

There are some more visible "global" macros: the built-in generic tests (unique, not_null, relationships, accepted_values), or macros that control configurable behavior (generate_schema_name).

Meanwhile, as the dbt_utils package has grown over the past few years, it's naturally found itself developing a set of reusable building blocks: "cross-database macros." These handle everything from data types to datediff. So long as these building blocks are implemented for a given dbt adapter, higher-order macros, such as dbt_utils.date_spine, just work on that database/engine/etc.

Following the excellent discussion in dbt-labs/dbt-utils#487, it's become clear that:

dbt_utils is a monolith that needs splitting up
The right place for these cross-database building blocks is within dbt-core and adapter plugins, where they can be more easily maintained and tested (see: new testing framework in v1.1!)

Who benefits?

Package maintainers, who can write higher-order functionality on top of a standard set of reusable building blocks, defined within the core standard, without requiring yet-another-utils-dependency
Users and maintainers of "shim" packages. Though dispatch-override remains an important capability, it's never been a particularly ergonomic experience, and testing these "shims" has always been tricky.

Initial scope

[CT-314] [Feature] Migrate cross-db functions from dbt-utils to definition in Core, implementation in adapters #4813

Considerations

Should these live right within the dbt macro namespace, or within another built-in / internal namespace (dbt_utils)?
Are these macros the right ones? They're a decidedly pragmatic bunch. At no point did we sit down and attempt to write an exhaustive superset of SQL dialects, every little thing that can vary between them. These are the building blocks that we've identified, organically, as thousands of people have installed dbt_utils and shimmed those utilities across a dozen popular databases.

entechlog · 2022-04-19T16:25:30Z

entechlog
Apr 19, 2022

Thanks @jtcohen6 for the details. Here is my wish list of features which I would love to see in the next version

Hoping to hear the decision on Column Level Data Lineage, Is that something dbt will be supporting in future OR customers should continue to invest in other tools for near future ?
Publishing test results to dbt docs. Currently test results are not natively published to dbt docs and hence not exposed to the use base outside the database users. Do we have plans to bring the test results and test summary into dbt ?

1 reply

jtcohen6 Apr 29, 2022
Maintainer Author

@entechlog Thanks for sharing your wish list!

Column-level lineage (and SQL parsing more generally) is a topic we're continuing to research. We don't expect to be able to work on it in earnest this year. In the meantime, we're excited to see the growing interest, and the work that many members of the community are doing here. #4458 is the right place to discuss imagined use cases.

Test results in dbt docs: We're continuing to invest in dbt-core's capabilities to provide unified views of metadata, via dbt artifacts and logging. That includes metadata on test results (statuses + # failures), although we believe that the data returned by the test failures ought to always live within the warehouse (via store_failures). dbt could do more to make those tables of test results easy to discover and ergonomic to query while debugging test failures.

dbt-docs continues to be the primary way that most users access and share dbt metadata. It's never going to be as complete as the full metadata available from dbt artifacts and (if using dbt Cloud) the Metadata API. In particular, dbt-docs doesn't include any information from run_results.json. In order to stitch that information together, we'd need a mechanism to combine metadata from several dbt-core invocations—something that the metadata API is able to do at scale.

joshuataylor · 2022-05-07T02:00:18Z

joshuataylor
May 7, 2022

Not much to add, but 1.2 sounds amazing as adapter improvements would be a huge boost.

We use Snowflake, and some of those would be amazing, especially around the dry run as we can run a "full refresh" dry run and also an incremental dry run to make sure the SQL inside the is_incremental macro is valid.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Planning for v1.2: "Adapter Ergonomics" #5091

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Planning for v1.2: "Adapter Ergonomics" #5091

jtcohen6 Apr 19, 2022 Maintainer

Materializations

Initial scope

Stretch goals

Down the line

Cross-database macros

Initial scope

Considerations

Replies: 2 comments · 1 reply

entechlog Apr 19, 2022

jtcohen6 Apr 29, 2022 Maintainer Author

joshuataylor May 7, 2022

jtcohen6
Apr 19, 2022
Maintainer

Replies: 2 comments 1 reply

entechlog
Apr 19, 2022

jtcohen6 Apr 29, 2022
Maintainer Author

joshuataylor
May 7, 2022