Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1 6 0 #215

Merged
merged 5 commits into from
Nov 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
### Release [1.6.0], 2023-11-30
#### Improvements
- Compatible with dbt 1.6.x. Note that dbt new `clone` feature is not supported, as ClickHouse has no native "light weight"
clone functionality, and copying tables without actual data transfer is not possible in ClickHouse (barring file manipulation
outside ClickHouse itself).
- A new ClickHouse specific Materialized View materialization contributed by [Rory Sawyer](https://github.com/SoryRawyer).
This creates a ClickHouse Materialized view using the `TO` form with the name `<model_name>_mv` and the associated target
table `<model_name>`. It's highly recommended to fully understand how ClickHouse materialized views work before using
this materialization.

### Release [1.5.2], 2023-11-28
#### Bug Fixes
- The `ON CLUSTER` clause was in the incorrect place for legacy incremental materializations. This has been fixed. Thanks to
Expand Down
31 changes: 21 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ pip install dbt-clickhouse
- [x] Table materialization
- [x] View materialization
- [x] Incremental materialization
- [x] Materialized View materializations (uses the `TO` form of MATERIALIZED VIEW, experimental)
- [x] Seeds
- [x] Sources
- [x] Docs generate
Expand Down Expand Up @@ -102,16 +103,9 @@ your_profile_name:
| settings | A map/dictionary of "TABLE" settings to be used to DDL statements like 'CREATE TABLE' with this model | |
| query_settings | A map/dictionary of ClickHouse user level settings to be used with `INSERT` or `DELETE` statements in conjunction with this model | |

## A Note on Model Settings
ClickHouse has several types/levels of "settings". In the model configuration above, two types of these are configurable. `settings` means the `SETTINGS`
clause used in `CREATE TABLE/VIEW` types of DDL statements, so this is generally settings that are specific to the specific ClickHouse table engine. The new
`query_settings` is use to add a `SETTINGS` clause to the `INSERT` and `DELETE` queries used for model materialization (including incremental materializations).
There are hundreds of ClickHouse settings, and it's not always clear which is a "table" setting and which is a "user" setting (although the latter are generally
available in the `system.settings` table.) In general the defaults are recommended, and any use of these properties should be carefully researched and tested.

## ClickHouse Cluster

`cluster` setting in profile enables dbt-clickhouse to run against a ClickHouse cluster.
The `cluster` setting in profile enables dbt-clickhouse to run against a ClickHouse cluster.

### Effective Scope

Expand All @@ -130,6 +124,15 @@ table and incremental materializations with non-replicated engine will not be af
If a model has been created without a `cluster` setting, dbt-clickhouse will detect the situation and run all DDL/DML without `on cluster` clause for this model.


## A Note on Model Settings

ClickHouse has several types/levels of "settings". In the model configuration above, two types of these are configurable. `settings` means the `SETTINGS`
clause used in `CREATE TABLE/VIEW` types of DDL statements, so this is generally settings that are specific to the specific ClickHouse table engine. The new
`query_settings` is use to add a `SETTINGS` clause to the `INSERT` and `DELETE` queries used for model materialization (including incremental materializations).
There are hundreds of ClickHouse settings, and it's not always clear which is a "table" setting and which is a "user" setting (although the latter are generally
available in the `system.settings` table.) In general the defaults are recommended, and any use of these properties should be carefully researched and tested.


## Known Limitations

* Ephemeral models/CTEs don't work if placed before the "INSERT INTO" in a ClickHouse insert statement, see https://github.com/ClickHouse/ClickHouse/issues/30323. This
Expand Down Expand Up @@ -192,17 +195,25 @@ keys used to populate the parameters of the S3 table function:
| fmt | The expected ClickHouse input format (such as `TSV` or `CSVWithNames`) of the referenced S3 objects. |
| structure | The column structure of the data in bucket, as a list of name/datatype pairs, such as `['id UInt32', 'date DateTime', 'value String']` If not provided ClickHouse will infer the structure. |
| aws_access_key_id | The S3 access key id. |
| aws_secret_access_key | The S3 secrete key. |
| aws_secret_access_key | The S3 secret key. |
| compression | The compression method used with the S3 objects. If not provided ClickHouse will attempt to determine compression based on the file name. |

See the [S3 test file](https://github.com/ClickHouse/dbt-clickhouse/blob/main/tests/integration/adapter/test_s3.py) for examples of how to use this macro.
See the [S3 test file](https://github.com/ClickHouse/dbt-clickhouse/blob/main/tests/integration/adapter/clickhouse/test_clickhouse_s3.py) for examples of how to use this macro.

# Contracts and Constraints

Only exact column type contracts are supported. For example, a contract with a UInt32 column type will fail if the model returns a UInt64 or other integer type.
ClickHouse also support _only_ `CHECK` constraints on the entire table/model. Primary key, foreign key, unique, and column level CHECK constraints are not supported.
(See ClickHouse documentation on primary/order by keys.)

# Materialized Views (Experimental)
A `materialized_view` materialization should be a `SELECT` from an existing (source) table. The adapter will create a target table with the model name
and a ClickHouse MATERIALIZED VIEW with the name `<model_name>_mv`. Unlike PostgreSQL, a ClickHouse materialized view is not "static" (and has
no corresponding REFRESH operation). Instead, it acts as an "insert trigger", and will insert new rows into the target table using the defined `SELECT`
"transformation" in the view definition on rows inserted into the source table. See the [test file]
(https://github.com/ClickHouse/dbt-clickhouse/blob/main/tests/integration/adapter/materialized_view/test_materialized_view.py) for an introductory example
of how to use this functionality.

# Distributed materializations

Notes:
Expand Down
2 changes: 1 addition & 1 deletion dbt/adapters/clickhouse/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
version = '1.5.2'
version = '1.6.0'
2 changes: 1 addition & 1 deletion dbt/adapters/clickhouse/connections.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ def get_table_from_response(cls, response, column_names) -> agate.Table:
return dbt.clients.agate_helper.table_from_data_flat(data, column_names)

def execute(
self, sql: str, auto_begin: bool = False, fetch: bool = False
self, sql: str, auto_begin: bool = False, fetch: bool = False, limit: Optional[int] = None
) -> Tuple[AdapterResponse, agate.Table]:
# Don't try to fetch result of clustered DDL responses, we don't know what to do with them
if fetch and ddl_re.match(sql):
Expand Down
4 changes: 3 additions & 1 deletion dbt/adapters/clickhouse/dbclient.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,9 @@ def _ensure_database(self, database_engine, cluster_name) -> None:
if cluster_name is not None and cluster_name.strip() != ''
else ''
)
self.command(f'CREATE DATABASE {self.database}{cluster_clause}{engine_clause}')
self.command(
f'CREATE DATABASE IF NOT EXISTS {self.database}{cluster_clause}{engine_clause}'
)
db_exists = self.command(check_db)
if not db_exists:
raise FailedToConnectError(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

{%- set target_relation = this.incorporate(type='table') -%}
{%- set mv_name = target_relation.name + '_mv' -%}
{%- set target_mv = api.Relation.create(identifier=mv_name, schema=schema, database=database, type='materializedview') -%}
{%- set target_mv = api.Relation.create(identifier=mv_name, schema=schema, database=database, type='materialized_view') -%}
{%- set cluster_clause = on_cluster_clause(target_relation) -%}

{# look for an existing relation for the target table and create backup relations if necessary #}
Expand Down
8 changes: 4 additions & 4 deletions dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
dbt-core~=1.5.8
dbt-core~=1.6.9
clickhouse-connect>=0.6.21
clickhouse-driver>=0.2.6
pytest>=7.2.0
pytest-dotenv==0.5.2
dbt-tests-adapter~=1.5.8
black==22.3.0
dbt-tests-adapter~=1.6.9
black==23.11.0
isort==5.10.1
mypy==0.991
yamllint==1.26.3
flake8==4.0.1
types-requests==2.27.29
agate~=1.6.3
agate~=1.7.1
requests~=2.27.1
setuptools~=65.3.0
types-setuptools==67.1.0.0
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[tool.black]
line-length = 100
skip-string-normalization = true
target-version = ['py38', 'py39']
target-version = ['py310', 'py311']
exclude = '(\.eggs|\.git|\.mypy_cache|\.venv|venv|env|_build|build|build|dist|)'

[tool.isort]
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def _dbt_clickhouse_version():
package_version = _dbt_clickhouse_version()
description = '''The Clickhouse plugin for dbt (data build tool)'''

dbt_version = '1.5.0'
dbt_version = '1.6.0'
dbt_minor = '.'.join(dbt_version.split('.')[0:2])

if not package_version.startswith(dbt_minor):
Expand Down Expand Up @@ -58,7 +58,7 @@ def _dbt_clickhouse_version():
'clickhouse-connect>=0.6.21',
'clickhouse-driver>=0.2.6',
],
python_requires=">=3.7",
python_requires=">=3.8",
platforms='any',
classifiers=[
'Development Status :: 5 - Production/Stable',
Expand Down
7 changes: 7 additions & 0 deletions tests/integration/adapter/dbt_clone/test_dbt_clone.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import pytest
from dbt.tests.adapter.dbt_clone.test_dbt_clone import BaseClonePossible


@pytest.mark.skip("clone not supported")
class TestBaseClonePossible(BaseClonePossible):
pass
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""
test materialized view creation
test materialized view creation. This is ClickHouse specific, which has a significantly different implementation
of materialized views from PostgreSQL or Oracle
"""

import json
Expand Down