You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ClickHouse supports accessing remote clusters via ON CLUSTER statements to any cluster configured via remote_servers section. Distributed tables can be repurposed to point to tables in any cluster. For a multi-cluster architecture, I would like dbt-clickhouse to be able to create distributed tables on a remote cluster, pointing to the actual tables in the local cluster.
An example distributed table could look as following:
I would suggest this additional layer independent of materialized and incremental_strategy, for distributed and non-distributed materializations. That would mean, for distributed_table materialization, there would be a need for two distributed tables (one on the local cluster and one on the remote cluster) as well as one "local" table (on the local cluster);
For usage, I propose adding remote_clusters as an optional list parameter to profiles, and a add_to_remote_clusters boolean flag as a model configuration.
Functional requirements could be the following:
when no remote_clusters are configured, current functionality is unchanged
materialization will fail when remote_clusters are configured, but the current clickhouse host doesn’t know all of the clusters
materialization will fail when add_to_remote_clusters is set to true, but no remote_clusters are configured
when remote_clusters are configured and add_to_remote_clusters is set to true then ..
materializations will create additional distributed tables on the remote clusters, pointing to the local tables.
databases are created correctly on remote clusters
schemas are updated consistently on local and remote clusters
Feel free to assign me directly to it.
The text was updated successfully, but these errors were encountered:
Hi!
I like the idea, I'd love to work on it
Do you think this configuration inside the model is enough or should I inherit this from profiles.yml? Do the tests need to be modified or will testing for the two clusters test_shard and test_replica suffice?
ClickHouse supports accessing remote clusters via
ON CLUSTER
statements to any cluster configured via remote_servers section. Distributed tables can be repurposed to point to tables in any cluster. For a multi-cluster architecture, I would like dbt-clickhouse to be able to create distributed tables on a remote cluster, pointing to the actual tables in the local cluster.An example distributed table could look as following:
I would suggest this additional layer independent of
materialized
andincremental_strategy
, for distributed and non-distributed materializations. That would mean, fordistributed_table
materialization, there would be a need for two distributed tables (one on the local cluster and one on the remote cluster) as well as one "local" table (on the local cluster);For usage, I propose adding
remote_clusters
as an optional list parameter to profiles, and aadd_to_remote_clusters
boolean flag as a model configuration.Functional requirements could be the following:
remote_clusters
are configured, current functionality is unchangedremote_clusters
are configured, but the current clickhouse host doesn’t know all of the clustersadd_to_remote_clusters
is set to true, but noremote_clusters
are configuredwhen
remote_clusters
are configured andadd_to_remote_clusters
is set to true then ..Feel free to assign me directly to it.
The text was updated successfully, but these errors were encountered: