Skip to content

Commit

Permalink
lint
Browse files Browse the repository at this point in the history
  • Loading branch information
dataders committed Jul 19, 2023
1 parent 1dd774b commit 75cce0d
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 44 deletions.
53 changes: 29 additions & 24 deletions website/docs/reference/resource-configs/bigquery-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ To-do:
- `schema` is interchangeable with the BigQuery concept `dataset`
- `database` is interchangeable with the BigQuery concept of `project`

For our reference documentation, you can declare `project` in place of `database.`
For our reference documentation, you can declare `project` in place of `database.`
This will allow you to read and write from multiple BigQuery projects. Same for `dataset`.

## Using table partitioning and clustering
Expand Down Expand Up @@ -61,6 +61,7 @@ The `partition_by` config can be supplied as a dictionary with the following for
```

#### Partitioning by a date or timestamp

<Changelog>Partitioning by hour, month or year is new in v0.19.0</Changelog>

When using a `datetime` or `timestamp` column to partition data, you can create partitions with a granularity of hour, day, month, or year. A `date` column supports granularity of day, month and year. Daily partitioning is the default for all column types.
Expand Down Expand Up @@ -268,13 +269,13 @@ as (

<Changelog>

- **v0.20.0:** Introduced `require_partition_filter` and `partition_expiration_days`
- **v0.20.0:** Introduced `require_partition_filter` and `partition_expiration_days`

</Changelog>

If your model has `partition_by` configured, you may optionally specify two additional configurations:

- `require_partition_filter` (boolean): If set to `true`, anyone querying this model _must_ specify a partition filter, otherwise their query will fail. This is recommended for very large tables with obvious partitioning schemes, such as event streams grouped by day. Note that this will affect other dbt models or tests that try to select from this model, too.
- `require_partition_filter` (boolean): If set to `true`, anyone querying this model *must* specify a partition filter, otherwise their query will fail. This is recommended for very large tables with obvious partitioning schemes, such as event streams grouped by day. Note that this will affect other dbt models or tests that try to select from this model, too.

- `partition_expiration_days` (integer): If set for date- or timestamp-type partitions, the partition will expire that many days after the date it represents. E.g. A partition representing `2021-01-01`, set to expire after 7 days, will no longer be queryable as of `2021-01-08`, its storage costs zeroed out, and its contents will eventually be deleted. Note that [table expiration](#controlling-table-expiration) will take precedence if specified.

Expand Down Expand Up @@ -369,7 +370,7 @@ The `labels` config can be provided in a model config, or in the `dbt_project.ym

<Changelog>

- **v1.5.0:** BigQuery key-value pair entries for labels larger than 63 characters are truncated.
- **v1.5.0:** BigQuery key-value pair entries for labels larger than 63 characters are truncated.

</Changelog>

Expand Down Expand Up @@ -408,11 +409,10 @@ models:

</File>



<Lightbox src="/img/docs/building-a-dbt-project/building-models/73eaa8a-Screen_Shot_2020-01-20_at_12.12.54_PM.png" title="Viewing labels in the BigQuery console"/>

### Specifying tags

BigQuery table and view *tags* can be created by supplying an empty string for the label value.

<File name='model.sql'>
Expand All @@ -431,9 +431,10 @@ select * from {{ ref('another_model') }}
</File>

### Policy tags

BigQuery enables [column-level security](https://cloud.google.com/bigquery/docs/column-level-security-intro) by setting [policy tags](https://cloud.google.com/bigquery/docs/best-practices-policy-tags) on specific columns.

dbt enables this feature as a column resource property, `policy_tags` (_not_ a node config).
dbt enables this feature as a column resource property, `policy_tags` (*not* a node config).

<File name='models/<filename>.yml'>

Expand All @@ -457,8 +458,9 @@ Please note that in order for policy tags to take effect, [column-level `persist
The [`incremental_strategy` config](/docs/build/incremental-models#about-incremental_strategy) controls how dbt builds incremental models. dbt uses a [merge statement](https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax) on BigQuery to refresh incremental tables.

The `incremental_strategy` config can be set to one of two values:
- `merge` (default)
- `insert_overwrite`

- `merge` (default)
- `insert_overwrite`

### Performance and cost

Expand All @@ -470,6 +472,7 @@ model configuration. See [this guide](https://discourse.getdbt.com/t/benchmarkin
built with either the `merge` or the `insert_overwrite` incremental strategy.

### The `merge` strategy

The `merge` incremental strategy will generate a `merge` statement that looks
something like:

Expand All @@ -491,7 +494,7 @@ strategy is selected.

<Changelog>

- **v0.16.0:** Introduced `insert_overwrite` incremental strategy
- **v0.16.0:** Introduced `insert_overwrite` incremental strategy

</Changelog>

Expand Down Expand Up @@ -583,13 +586,13 @@ with events as (
</File>

This example model serves to replace the data in the destination table for both
_today_ and _yesterday_ every day that it is run. It is the fastest and cheapest
*today* and *yesterday* every day that it is run. It is the fastest and cheapest
way to incrementally update a table using dbt. If we wanted this to run more dynamically—
let’s say, always for the past 3 days—we could leverage dbt’s baked-in [datetime macros](https://github.com/dbt-labs/dbt-core/blob/dev/octavius-catto/core/dbt/include/global_project/macros/etc/datetime.sql) and write a few of our own.

<Changelog>

- **v0.19.0:** With the advent of truncated timestamp partitions in BigQuery, `timestamp`-type partitions are now treated as timestamps instead of dates for the purposes of filtering. Update `partitions_to_replace` accordingly.
- **v0.19.0:** With the advent of truncated timestamp partitions in BigQuery, `timestamp`-type partitions are now treated as timestamps instead of dates for the purposes of filtering. Update `partitions_to_replace` accordingly.

</Changelog>

Expand All @@ -601,10 +604,10 @@ If no `partitions` configuration is provided, dbt will instead:

1. Create a temporary table for your model SQL
2. Query the temporary table to find the distinct partitions to be overwritten
3. Query the destination table to find the _max_ partition in the database
3. Query the destination table to find the *max* partition in the database

When building your model SQL, you can take advantage of the introspection performed
by dbt to filter for only _new_ data. The max partition in the destination table
by dbt to filter for only *new* data. The max partition in the destination table
will be available using the `_dbt_max_partition` BigQuery scripting variable. **Note:**
this is a BigQuery SQL variable, not a dbt Jinja variable, so no jinja brackets are
required to access this variable.
Expand Down Expand Up @@ -685,6 +688,7 @@ from {{ ref('events') }}
</VersionBlock>

## Controlling table expiration

<Changelog>New in v0.18.0</Changelog>

By default, dbt-created tables never expire. You can configure certain model(s)
Expand Down Expand Up @@ -766,24 +770,25 @@ The `grant_access_to` config is not thread-safe when multiple views need to be a

<VersionBlock firstVersion="1.7">


## Materialized view

The BigQuery adapter supports [materialized views](https://cloud.google.com/bigquery/docs/materialized-views-intro) and refreshes them for every subsequent `dbt run` you execute. For more information, see [Refresh Materialized Views](https://cloud.google.com/bigquery/docs/materialized-views-manage#refresh) in the Google docs.

Materialized views support the optional configuration `on_configuration_change` with the following values:
Materialized views support the optional configuration `on_configuration_change` with the following values:

- `apply` (default) &mdash; attempts to update the existing database object if possible, avoiding a complete rebuild. The following changes can be applied without the need to rebuild the materialized view:
- enable_refresh
- refresh_interval_minutes
- max_staleness
- enable_refresh
- refresh_interval_minutes
- max_staleness
- `skip` &mdash; allows runs to continue while also providing a warning that the model was skipped
- `fail` &mdash; forces runs to fail if a change is detected in a materialized view
- `fail` &mdash; forces runs to fail if a change is detected in a materialized view

You can create a materialized view by editing *one* of these files:

You can create a materialized view by editing _one_ of these files:
- the SQL file for your model
- the `dbt_project.yml` configuration file

The following examples create a materialized view:
The following examples create a materialized view:

<File name='models/YOUR_MODEL_NAME.sql'>

Expand All @@ -798,14 +803,14 @@ The following examples create a materialized view:

</File>


<File name='dbt_project.yml'>

```yaml
```yaml
models:
path:
materialized: materialized_view
```
</File>
</VersionBlock>
24 changes: 13 additions & 11 deletions website/docs/reference/resource-configs/postgres-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,13 @@ In dbt-postgres, the following incremental materialization strategies are suppor
- `merge`
- `delete+insert`


## Performance Optimizations

### Unlogged

<Changelog>

- **v0.14.1:** Introduced native support for `unlogged` config
- **v0.14.1:** Introduced native support for `unlogged` config

</Changelog>

Expand Down Expand Up @@ -50,11 +49,12 @@ While Postgres works reasonably well for datasets smaller than about 10m rows, d
<Changelog>
- **v0.20.0:** Introduced native support for `indexes` config
- **v0.20.0:** Introduced native support for `indexes` config

</Changelog>

Table models, incremental models, seeds, and snapshots may have a list of `indexes` defined. Each Postgres index can have three components:

- `columns` (list, required): one or more columns on which the index is defined
- `unique` (boolean, optional): whether the index should be [declared unique](https://www.postgresql.org/docs/9.4/indexes-unique.html)
- `type` (string, optional): a supported [index type](https://www.postgresql.org/docs/current/indexes-types.html) (B-tree, Hash, GIN, etc)
Expand Down Expand Up @@ -111,19 +111,21 @@ models:

The Postgres adapter supports [materialized views](https://www.postgresql.org/docs/current/rules-materializedviews.html) and refreshes them for every subsequent `dbt run` you execute. For more information, see [Refresh Materialized Views](https://www.postgresql.org/docs/15/sql-refreshmaterializedview.html) in the Postgres docs.

Materialized views support the optional configuration `on_configuration_change` with the following values:
Materialized views support the optional configuration `on_configuration_change` with the following values:

- `apply` (default) &mdash; attempts to update the existing database object if possible, avoiding a complete rebuild. The following index action can be applied without the need to rebuild the materialized view:
- Added
- Dropped
- Updated
- Added
- Dropped
- Updated
- `skip` &mdash; allows runs to continue while also providing a warning that the model was skipped
- `fail` &mdash; forces runs to fail if a change is detected in a materialized view
- `fail` &mdash; forces runs to fail if a change is detected in a materialized view

You can create a materialized view by editing _one_ of these files:

- the SQL file for your model
- the `dbt_project.yml` configuration file

The following examples create a materialized view:
The following examples create a materialized view:

<File name='models/YOUR_MODEL_NAME.sql'>

Expand All @@ -138,14 +140,14 @@ The following examples create a materialized view:

</File>


<File name='dbt_project.yml'>

```yaml
```yaml
models:
path:
materialized: materialized_view
```

</File>

</VersionBlock>
12 changes: 7 additions & 5 deletions website/docs/reference/resource-configs/redshift-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,16 +102,18 @@ models:
The Redshift adapter supports [materialized views](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-overview.html) and refreshes them for every subsequent `dbt run` that you execute. For more information, see [Refresh Materialized Views](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-refresh.html) in the Redshift docs.

Materialized views support the optional configuration `on_configuration_change` with the following values:
Materialized views support the optional configuration `on_configuration_change` with the following values:

- `apply` (default) &mdash; attempts to update the existing database object if possible, avoiding a complete rebuild. The `auto_refresh` action can applied without the need to rebuild the materialized view.
- `skip` &mdash; allows runs to continue while also providing a warning that the model was skipped
- `fail` &mdash; forces runs to fail if a change is detected in a materialized view
- `fail` &mdash; forces runs to fail if a change is detected in a materialized view

You can create a materialized view by editing _one_ of these files:

- the SQL file for your model
- the `dbt_project.yml` configuration file

The following examples create a materialized view:
The following examples create a materialized view:

<File name='models/YOUR_MODEL_NAME.sql'>

Expand All @@ -126,14 +128,14 @@ The following examples create a materialized view:

</File>


<File name='dbt_project.yml'>

```yaml
```yaml
models:
path:
materialized: materialized_view
```

</File>

</VersionBlock>
8 changes: 4 additions & 4 deletions website/docs/reference/resource-configs/snowflake-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ select ...

In this example, you can set up a query tag to be applied to every query with the model's name.

```sql
```sql
{% macro set_query_tag() -%}
{% set new_query_tag = model.name %}
Expand Down Expand Up @@ -183,7 +183,7 @@ models:

## Configuring virtual warehouses

The default warehouse that dbt uses can be configured in your [Profile](/docs/core/connect-data-platform/profiles.yml) for Snowflake connections. To override the warehouse that is used for specific models (or groups of models), use the `snowflake_warehouse` model configuration. This configuration can be used to specify a larger warehouse for certain models in order to control Snowflake costs and project build times.
The default warehouse that dbt uses can be configured in your [Profile](/docs/core/connect-data-platform/profiles.yml) for Snowflake connections. To override the warehouse that is used for specific models (or groups of models), use the `snowflake_warehouse` model configuration. This configuration can be used to specify a larger warehouse for certain models in order to control Snowflake costs and project build times.

<Tabs
defaultValue="dbt_project.yml"
Expand Down Expand Up @@ -303,9 +303,9 @@ models:

## Temporary Tables

Beginning in dbt version 1.3, incremental table merges for Snowflake prefer to utilize a `view` rather than a `temporary table`. The reasoning was to avoid the database write step that a temporary table would initiate and save compile time.
Beginning in dbt version 1.3, incremental table merges for Snowflake prefer to utilize a `view` rather than a `temporary table`. The reasoning was to avoid the database write step that a temporary table would initiate and save compile time.

However, some situations remain where a temporary table would achieve results faster or more safely. dbt v1.4 adds the `tmp_relation_type` configuration to allow you to opt in to temporary tables for incremental builds. This is defined as part of the model configuration.
However, some situations remain where a temporary table would achieve results faster or more safely. dbt v1.4 adds the `tmp_relation_type` configuration to allow you to opt in to temporary tables for incremental builds. This is defined as part of the model configuration.

To guarantee accuracy, an incremental model using the `delete+insert` strategy with a `unique_key` defined requires a temporary table; trying to change this to a view will result in an error.

Expand Down

0 comments on commit 75cce0d

Please sign in to comment.