From 75cce0d41e15ef7d3fa4382ac4a76bdeba84c484 Mon Sep 17 00:00:00 2001 From: Anders Swanson Date: Wed, 19 Jul 2023 13:00:55 -0400 Subject: [PATCH] lint --- .../resource-configs/bigquery-configs.md | 53 ++++++++++--------- .../resource-configs/postgres-configs.md | 24 +++++---- .../resource-configs/redshift-configs.md | 12 +++-- .../resource-configs/snowflake-configs.md | 8 +-- 4 files changed, 53 insertions(+), 44 deletions(-) diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md index 6352fd05195..7e386352b37 100644 --- a/website/docs/reference/resource-configs/bigquery-configs.md +++ b/website/docs/reference/resource-configs/bigquery-configs.md @@ -14,7 +14,7 @@ To-do: - `schema` is interchangeable with the BigQuery concept `dataset` - `database` is interchangeable with the BigQuery concept of `project` -For our reference documentation, you can declare `project` in place of `database.` +For our reference documentation, you can declare `project` in place of `database.` This will allow you to read and write from multiple BigQuery projects. Same for `dataset`. ## Using table partitioning and clustering @@ -61,6 +61,7 @@ The `partition_by` config can be supplied as a dictionary with the following for ``` #### Partitioning by a date or timestamp + Partitioning by hour, month or year is new in v0.19.0 When using a `datetime` or `timestamp` column to partition data, you can create partitions with a granularity of hour, day, month, or year. A `date` column supports granularity of day, month and year. Daily partitioning is the default for all column types. @@ -268,13 +269,13 @@ as ( - - **v0.20.0:** Introduced `require_partition_filter` and `partition_expiration_days` +- **v0.20.0:** Introduced `require_partition_filter` and `partition_expiration_days` If your model has `partition_by` configured, you may optionally specify two additional configurations: -- `require_partition_filter` (boolean): If set to `true`, anyone querying this model _must_ specify a partition filter, otherwise their query will fail. This is recommended for very large tables with obvious partitioning schemes, such as event streams grouped by day. Note that this will affect other dbt models or tests that try to select from this model, too. +- `require_partition_filter` (boolean): If set to `true`, anyone querying this model *must* specify a partition filter, otherwise their query will fail. This is recommended for very large tables with obvious partitioning schemes, such as event streams grouped by day. Note that this will affect other dbt models or tests that try to select from this model, too. - `partition_expiration_days` (integer): If set for date- or timestamp-type partitions, the partition will expire that many days after the date it represents. E.g. A partition representing `2021-01-01`, set to expire after 7 days, will no longer be queryable as of `2021-01-08`, its storage costs zeroed out, and its contents will eventually be deleted. Note that [table expiration](#controlling-table-expiration) will take precedence if specified. @@ -369,7 +370,7 @@ The `labels` config can be provided in a model config, or in the `dbt_project.ym - - **v1.5.0:** BigQuery key-value pair entries for labels larger than 63 characters are truncated. +- **v1.5.0:** BigQuery key-value pair entries for labels larger than 63 characters are truncated. @@ -408,11 +409,10 @@ models: - - ### Specifying tags + BigQuery table and view *tags* can be created by supplying an empty string for the label value. @@ -431,9 +431,10 @@ select * from {{ ref('another_model') }} ### Policy tags + BigQuery enables [column-level security](https://cloud.google.com/bigquery/docs/column-level-security-intro) by setting [policy tags](https://cloud.google.com/bigquery/docs/best-practices-policy-tags) on specific columns. -dbt enables this feature as a column resource property, `policy_tags` (_not_ a node config). +dbt enables this feature as a column resource property, `policy_tags` (*not* a node config). @@ -457,8 +458,9 @@ Please note that in order for policy tags to take effect, [column-level `persist The [`incremental_strategy` config](/docs/build/incremental-models#about-incremental_strategy) controls how dbt builds incremental models. dbt uses a [merge statement](https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax) on BigQuery to refresh incremental tables. The `incremental_strategy` config can be set to one of two values: - - `merge` (default) - - `insert_overwrite` + +- `merge` (default) +- `insert_overwrite` ### Performance and cost @@ -470,6 +472,7 @@ model configuration. See [this guide](https://discourse.getdbt.com/t/benchmarkin built with either the `merge` or the `insert_overwrite` incremental strategy. ### The `merge` strategy + The `merge` incremental strategy will generate a `merge` statement that looks something like: @@ -491,7 +494,7 @@ strategy is selected. - - **v0.16.0:** Introduced `insert_overwrite` incremental strategy +- **v0.16.0:** Introduced `insert_overwrite` incremental strategy @@ -583,13 +586,13 @@ with events as ( This example model serves to replace the data in the destination table for both -_today_ and _yesterday_ every day that it is run. It is the fastest and cheapest +*today* and *yesterday* every day that it is run. It is the fastest and cheapest way to incrementally update a table using dbt. If we wanted this to run more dynamically— let’s say, always for the past 3 days—we could leverage dbt’s baked-in [datetime macros](https://github.com/dbt-labs/dbt-core/blob/dev/octavius-catto/core/dbt/include/global_project/macros/etc/datetime.sql) and write a few of our own. - - **v0.19.0:** With the advent of truncated timestamp partitions in BigQuery, `timestamp`-type partitions are now treated as timestamps instead of dates for the purposes of filtering. Update `partitions_to_replace` accordingly. +- **v0.19.0:** With the advent of truncated timestamp partitions in BigQuery, `timestamp`-type partitions are now treated as timestamps instead of dates for the purposes of filtering. Update `partitions_to_replace` accordingly. @@ -601,10 +604,10 @@ If no `partitions` configuration is provided, dbt will instead: 1. Create a temporary table for your model SQL 2. Query the temporary table to find the distinct partitions to be overwritten -3. Query the destination table to find the _max_ partition in the database +3. Query the destination table to find the *max* partition in the database When building your model SQL, you can take advantage of the introspection performed -by dbt to filter for only _new_ data. The max partition in the destination table +by dbt to filter for only *new* data. The max partition in the destination table will be available using the `_dbt_max_partition` BigQuery scripting variable. **Note:** this is a BigQuery SQL variable, not a dbt Jinja variable, so no jinja brackets are required to access this variable. @@ -685,6 +688,7 @@ from {{ ref('events') }} ## Controlling table expiration + New in v0.18.0 By default, dbt-created tables never expire. You can configure certain model(s) @@ -766,24 +770,25 @@ The `grant_access_to` config is not thread-safe when multiple views need to be a - ## Materialized view The BigQuery adapter supports [materialized views](https://cloud.google.com/bigquery/docs/materialized-views-intro) and refreshes them for every subsequent `dbt run` you execute. For more information, see [Refresh Materialized Views](https://cloud.google.com/bigquery/docs/materialized-views-manage#refresh) in the Google docs. -Materialized views support the optional configuration `on_configuration_change` with the following values: +Materialized views support the optional configuration `on_configuration_change` with the following values: + - `apply` (default) — attempts to update the existing database object if possible, avoiding a complete rebuild. The following changes can be applied without the need to rebuild the materialized view: - - enable_refresh - - refresh_interval_minutes - - max_staleness + - enable_refresh + - refresh_interval_minutes + - max_staleness - `skip` — allows runs to continue while also providing a warning that the model was skipped -- `fail` — forces runs to fail if a change is detected in a materialized view +- `fail` — forces runs to fail if a change is detected in a materialized view + +You can create a materialized view by editing *one* of these files: -You can create a materialized view by editing _one_ of these files: - the SQL file for your model - the `dbt_project.yml` configuration file -The following examples create a materialized view: +The following examples create a materialized view: @@ -798,14 +803,14 @@ The following examples create a materialized view: - -```yaml +```yaml models: path: materialized: materialized_view ``` + diff --git a/website/docs/reference/resource-configs/postgres-configs.md b/website/docs/reference/resource-configs/postgres-configs.md index eb9108ad431..cd45e02560c 100644 --- a/website/docs/reference/resource-configs/postgres-configs.md +++ b/website/docs/reference/resource-configs/postgres-configs.md @@ -12,14 +12,13 @@ In dbt-postgres, the following incremental materialization strategies are suppor - `merge` - `delete+insert` - ## Performance Optimizations ### Unlogged - - **v0.14.1:** Introduced native support for `unlogged` config +- **v0.14.1:** Introduced native support for `unlogged` config @@ -50,11 +49,12 @@ While Postgres works reasonably well for datasets smaller than about 10m rows, d - - **v0.20.0:** Introduced native support for `indexes` config +- **v0.20.0:** Introduced native support for `indexes` config Table models, incremental models, seeds, and snapshots may have a list of `indexes` defined. Each Postgres index can have three components: + - `columns` (list, required): one or more columns on which the index is defined - `unique` (boolean, optional): whether the index should be [declared unique](https://www.postgresql.org/docs/9.4/indexes-unique.html) - `type` (string, optional): a supported [index type](https://www.postgresql.org/docs/current/indexes-types.html) (B-tree, Hash, GIN, etc) @@ -111,19 +111,21 @@ models: The Postgres adapter supports [materialized views](https://www.postgresql.org/docs/current/rules-materializedviews.html) and refreshes them for every subsequent `dbt run` you execute. For more information, see [Refresh Materialized Views](https://www.postgresql.org/docs/15/sql-refreshmaterializedview.html) in the Postgres docs. -Materialized views support the optional configuration `on_configuration_change` with the following values: +Materialized views support the optional configuration `on_configuration_change` with the following values: + - `apply` (default) — attempts to update the existing database object if possible, avoiding a complete rebuild. The following index action can be applied without the need to rebuild the materialized view: - - Added - - Dropped - - Updated + - Added + - Dropped + - Updated - `skip` — allows runs to continue while also providing a warning that the model was skipped -- `fail` — forces runs to fail if a change is detected in a materialized view +- `fail` — forces runs to fail if a change is detected in a materialized view You can create a materialized view by editing _one_ of these files: + - the SQL file for your model - the `dbt_project.yml` configuration file -The following examples create a materialized view: +The following examples create a materialized view: @@ -138,14 +140,14 @@ The following examples create a materialized view: - -```yaml +```yaml models: path: materialized: materialized_view ``` + diff --git a/website/docs/reference/resource-configs/redshift-configs.md b/website/docs/reference/resource-configs/redshift-configs.md index a0ebf7e88df..8f6a238e567 100644 --- a/website/docs/reference/resource-configs/redshift-configs.md +++ b/website/docs/reference/resource-configs/redshift-configs.md @@ -102,16 +102,18 @@ models: The Redshift adapter supports [materialized views](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-overview.html) and refreshes them for every subsequent `dbt run` that you execute. For more information, see [Refresh Materialized Views](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-refresh.html) in the Redshift docs. -Materialized views support the optional configuration `on_configuration_change` with the following values: +Materialized views support the optional configuration `on_configuration_change` with the following values: + - `apply` (default) — attempts to update the existing database object if possible, avoiding a complete rebuild. The `auto_refresh` action can applied without the need to rebuild the materialized view. - `skip` — allows runs to continue while also providing a warning that the model was skipped -- `fail` — forces runs to fail if a change is detected in a materialized view +- `fail` — forces runs to fail if a change is detected in a materialized view You can create a materialized view by editing _one_ of these files: + - the SQL file for your model - the `dbt_project.yml` configuration file -The following examples create a materialized view: +The following examples create a materialized view: @@ -126,14 +128,14 @@ The following examples create a materialized view: - -```yaml +```yaml models: path: materialized: materialized_view ``` + diff --git a/website/docs/reference/resource-configs/snowflake-configs.md b/website/docs/reference/resource-configs/snowflake-configs.md index bbef655d097..882c4dc3129 100644 --- a/website/docs/reference/resource-configs/snowflake-configs.md +++ b/website/docs/reference/resource-configs/snowflake-configs.md @@ -79,7 +79,7 @@ select ... In this example, you can set up a query tag to be applied to every query with the model's name. -```sql +```sql {% macro set_query_tag() -%} {% set new_query_tag = model.name %} @@ -183,7 +183,7 @@ models: ## Configuring virtual warehouses -The default warehouse that dbt uses can be configured in your [Profile](/docs/core/connect-data-platform/profiles.yml) for Snowflake connections. To override the warehouse that is used for specific models (or groups of models), use the `snowflake_warehouse` model configuration. This configuration can be used to specify a larger warehouse for certain models in order to control Snowflake costs and project build times. +The default warehouse that dbt uses can be configured in your [Profile](/docs/core/connect-data-platform/profiles.yml) for Snowflake connections. To override the warehouse that is used for specific models (or groups of models), use the `snowflake_warehouse` model configuration. This configuration can be used to specify a larger warehouse for certain models in order to control Snowflake costs and project build times.