diff --git a/.github/workflows/autoupdate.yml b/.github/workflows/autoupdate.yml index 4105ec6b902..f26abcb6802 100644 --- a/.github/workflows/autoupdate.yml +++ b/.github/workflows/autoupdate.yml @@ -2,11 +2,11 @@ name: Auto Update on: # This will trigger on all pushes to all branches. - push: {} +# push: {} # Alternatively, you can only trigger if commits are pushed to certain branches, e.g.: - # push: - # branches: - # - current + push: + branches: + - current # - unstable jobs: autoupdate: diff --git a/website/blog/2023-07-17-GPT-and-dbt-test.md b/website/blog/2023-07-17-GPT-and-dbt-test.md new file mode 100644 index 00000000000..84f756919a5 --- /dev/null +++ b/website/blog/2023-07-17-GPT-and-dbt-test.md @@ -0,0 +1,213 @@ +--- +title: "Create dbt Documentation and Tests 10x faster with ChatGPT" +description: "You can use ChatGPT to infer the context of verbosely named fields from database table schemas." +slug: create-dbt-documentation-10x-faster-with-ChatGPT + +authors: [pedro_brito_de_sa] + +tags: [analytics craft, data ecosystem] +hide_table_of_contents: true + +date: 2023-07-18 +is_featured: true +--- + +Whether you are creating your pipelines into dbt for the first time or just adding a new model once in a while, **good documentation and testing should always be a priority** for you and your team. Why do we avoid it like the plague then? Because it’s a hassle having to write down each individual field, its description in layman terms and figure out what tests should be performed to ensure the data is fine and dandy. How can we make this process faster and less painful? + +By now, everyone knows the wonders of the GPT models for code generation and pair programming so this shouldn’t come as a surprise. But **ChatGPT really shines** at inferring the context of verbosely named fields from database table schemas. So in this post I am going to help you 10x your documentation and testing speed by using ChatGPT to do most of the leg work for you. + + + +As a one-person Analytics team at [Sage](http://www.hellosage.com/) I had to create our dbt pipelines from the ground up. This meant 30+ tables of internal facts and dimensions + external data into a Staging Layer, plus all of the following layers of augmented models and Mart tables. After the fact, we are talking about 3500+ lines of YAML that I was NOT excited to get started on. Fortunately for me, this was February 2023 and ChatGPT had just come out. And boy, was I glad to have it. After a good dose of “prompt engineering” I managed to get most of my documentation and tests written out, only needing a few extra tweaks. + +Writing this article as of July 2023, and now powered by GPT-4 and not GPT 3.5, it is already easier to get the same results I did, so here are my learnings that I hope everyone can replicate. + +## Use verbose tables with verbose fields + +ChatGPT can only infer so much, so tables with names and fields that resemble encryption keys are unlikely to be good for this approach. In this example we are going to use this table: + +```sql +create or replace TRANSIENT TABLE STAGING.BASE.STG_STAFF_MEMBER ( + ID NUMBER(38,0), + CREATEDATETIME TIMESTAMP_NTZ(9), + UPDATEDATETIME TIMESTAMP_NTZ(9), + VERSION NUMBER(38,0), + FIRSTNAME VARCHAR(16777216), + JOBTITLE VARCHAR(16777216), + LASTNAME VARCHAR(16777216), + MIDDLENAME VARCHAR(16777216), + ISCAREADMIN BOOLEAN, + ISARCHIVED BOOLEAN, + ADDRESSID VARCHAR(16777216), + ENTERPRISEID VARCHAR(16777216), + ISDELETED BOOLEAN +); + +``` + +--- + +I copied this definition from Snowflake, and I would recommend that you always include both the name and type of the fields to get better results. This is especially helpful if you want to make sure ChatGPT correctly distinguishes between text, number, boolean or array fields, which may not be clear from the name alone. + +## Start out with simple prompts + +We all know how ChatGPT can digest very complex prompts, but as this is a tool for you to get the best results for your use case, I would recommend starting with simple, but specific, prompts. + +Opening ChatGPT with GPT4, my first prompt is usually along these lines: + + + +And the output of this simple prompt is nothing short of amazing: + +```yaml +version: 2 + +models: + - name: STAFF_MEMBER + description: This table contains information about the staff members. + columns: + - name: ID + description: The unique identifier for the staff member. + tests: + - unique + - not_null + - name: CREATEDATETIME + description: The timestamp when the record was created. + tests: + - not_null + - name: UPDATEDATETIME + description: The timestamp when the record was last updated. + - name: VERSION + description: Version number of the record. + - name: FIRSTNAME + description: The first name of the staff member. + tests: + - not_null + - name: JOBTITLE + description: The job title of the staff member. + tests: + - not_null + - name: LASTNAME + description: The last name of the staff member. + tests: + - not_null + - name: MIDDLENAME + description: The middle name of the staff member. + - name: ISCARADMIN + description: Boolean value indicating if the staff member is a care administrator. + tests: + - accepted_values: + values: ['true', 'false'] + - name: ISARCHIVED + description: Boolean value indicating if the staff member record is archived. + tests: + - accepted_values: + values: ['true', 'false'] + - name: COMMUNITYID + description: Identifier for the community of the staff member. + - name: ENTERPRISEID + description: Identifier for the enterprise of the staff member. + - name: ISDELETED + description: Boolean value indicating if the staff member record is deleted. + tests: + - accepted_values: + values: ['true', 'false'] +``` + +Back in my day (5 months ago), ChatGPT with GPT 3.5 didn’t have much context on what a dbt-style YAML file was, so the result was a funky, non-dbt YAML format. If you are using GPT 3.5 and the results you get from the first prompt don’t compile in dbt, I would recommend an extra step I took which is to pass your understanding of what a dbt-style YAML file is. You can do this simply by copy-pasting a piece of a publicly available YAML file like [this one from the Jaffle Shop example](https://github.com/dbt-labs/jaffle_shop/blob/main/models/schema.yml), or a piece of YAML from your own code base. + +A prompt for it would look something like: + + + +## Specify details on generic tests in your prompts + +dbt has four built-in generic tests: unique, not_null, accepted_values and relationships. My approach to adding details on these to the prompts is one of ‘more is better’, as in it’s faster to just delete tests that I don’t need in certain fields than having to add them manually. + +ChatGPT will throw unique and not_null tests for mostly all columns that have ‘id’ in their name, and in my experience it adds them to a few more columns than it should, which is usually quick to fix. + +Accepted_values and relationships are slightly trickier but the model can be adjusted to account for these in fairly simple ways. My tips are: + +- accepted_values: add extra detail to the schema like an extra keyword like ‘CATEGORICAL’ after the data type +- relationships: bind the relationships to fields with specific names. For better results, pass on a list of valid table names from your schema that it can refer to. + +One way of doing this is with a prompt like this: + + + +Which results in the following output: + +```yaml +version: 2 + +models: + - name: STAFF_MEMBER + description: This table contains information about the staff members. + columns: + - name: ID + description: The unique identifier for the staff member. + tests: + - unique + - not_null + - name: CREATEDATETIME + description: The timestamp when the record was created. + tests: + - not_null + - name: UPDATEDATETIME + description: The timestamp when the record was last updated. + - name: VERSION + description: Version number of the record. + - name: FIRSTNAME + description: The first name of the staff member. + tests: + - not_null + - name: JOBTITLE + description: The job title of the staff member. This is a categorical field. + tests: + - not_null + - accepted_values: + values: ['Job Title 1', 'Job Title 2', 'Job Title 3'] # replace these with actual job titles + - name: LASTNAME + description: The last name of the staff member. + tests: + - not_null + - name: MIDDLENAME + description: The middle name of the staff member. + - name: ISCARADMIN + description: Boolean value indicating if the staff member is a care administrator. + tests: + - accepted_values: + values: ['true', 'false'] + - name: ISARCHIVED + description: Boolean value indicating if the staff member record is archived. + tests: + - accepted_values: + values: ['true', 'false'] + - name: COMMUNITYID + description: Identifier for the community of the staff member. + tests: + - relationships: + to: STAGING.COMMUNITY.ID # replace with actual reference table + field: ID + - name: ENTERPRISEID + description: Identifier for the enterprise of the staff member. + tests: + - relationships: + to: STAGING.ENTERPRISE.ID # replace with actual reference table + field: ID + - name: ISDELETED + description: Boolean value indicating if the staff member record is deleted. + tests: + - accepted_values: + values: ['true', 'false'] +``` + +Even though it doesn’t magically guess the categories of your variables or your relationships, it is definitely helpful to have some placeholders in the right places. + +As an add-on, giving the model a short description of the data models and the tables you are working with will help it fine tune your results. + +## Wrap-Up + +Creating documentation is still a very manual job, and this approach only works for one table at a time (maybe you can be the one leveraging the OpenAI API and creating a webapp that processes multiple tables at once?). However, ChatGPT can clearly cut a lot of time in these tasks. + +I hope that these simple tips help you be more motivated and efficient in creating documentation and tests for your data models. And remember: verbosity in - verbosity out! diff --git a/website/blog/authors.yml b/website/blog/authors.yml index 72e747cc577..6d222e8a543 100644 --- a/website/blog/authors.yml +++ b/website/blog/authors.yml @@ -373,6 +373,16 @@ pat_kearns: name: Pat Kearns organization: dbt Labs +pedro_brito_de_sa: + image_url: /img/blog/authors/pedro_brito.jpeg + job_title: Product Analyst + links: + - icon: fa-linkedin + url: https://www.linkedin.com/in/pbritosa/ + name: Pedro Brito de Sa + organization: Sage + + rastislav_zdechovan: image_url: /img/blog/authors/rastislav-zdechovan.png job_title: Analytics Engineer diff --git a/website/blog/2021-11-23-on-the-importance-of-naming.md b/website/blog/src.md similarity index 100% rename from website/blog/2021-11-23-on-the-importance-of-naming.md rename to website/blog/src.md diff --git a/website/docs/docs/build/about-metricflow.md b/website/docs/docs/build/about-metricflow.md index 6ec7ecfe4b5..2a5e750aea3 100644 --- a/website/docs/docs/build/about-metricflow.md +++ b/website/docs/docs/build/about-metricflow.md @@ -33,7 +33,9 @@ There are a few key principles: - MetricFlow, as a part of the dbt Semantic Layer, allows organizations to define company metrics logic through YAML abstractions, as described in the following sections. -- You can install MetricFlow via PyPI as an extension of your [dbt adapter](/docs/supported-data-platforms) in the CLI. To install the adapter, run `pip install "dbt-metricflow[your_adapter_name]"` and add the adapter name at the end of the command. For example, for a Snowflake adapter run `pip install "dbt-metricflow[snowflake]"`. +- You can install MetricFlow using PyPI as an extension of your [dbt adapter](/docs/supported-data-platforms) in the CLI. To install the adapter, run `pip install "dbt-metricflow[your_adapter_name]"` and add the adapter name at the end of the command. For example, for a Snowflake adapter run `pip install "dbt-metricflow[snowflake]"`. + +- To query metrics dimensions, dimension values, and validate your configurations; install the [MetricFlow CLI](/docs/build/metricflow-cli). ### Semantic graph @@ -60,6 +62,7 @@ Metrics, which is a key concept, are functions that combine measures, constraint MetricFlow supports different metric types: +- [Cumulative](/docs/build/cumulative) — Aggregates a measure over a given window. - [Derived](/docs/build/derived) — An expression of other metrics, which allows you to do calculations on top of metrics. - [Ratio](/docs/build/ratio) — Create a ratio out of two measures, like revenue per customer. - [Simple](/docs/build/simple) — Metrics that refer directly to one measure. diff --git a/website/docs/docs/build/build-metrics-intro.md b/website/docs/docs/build/build-metrics-intro.md index e98ee013d0b..a87d4567a2b 100644 --- a/website/docs/docs/build/build-metrics-intro.md +++ b/website/docs/docs/build/build-metrics-intro.md @@ -18,7 +18,7 @@ To fully experience the dbt Semantic Layer, including the ability to query dbt m ::: Before you start, keep the following considerations in mind: -- Use the CLI to define metrics in YAML and query them using the [new metric specifications](https://github.com/dbt-labs/dbt-core/discussions/7456). +- Define metrics in YAML and query them using the [MetricFlow CLI](/docs/build/metricflow-cli). - You must be on dbt Core v1.6 beta or higher to use MetricFlow. [Upgrade your dbt version](/docs/core/pip-install#change-dbt-core-versions) to get started. * Note: Support for dbt Cloud and querying via external integrations coming soon. - MetricFlow currently only supports Snowflake and Postgres. diff --git a/website/docs/docs/build/cumulative-metrics.md b/website/docs/docs/build/cumulative-metrics.md index 77d23d32dce..efdde600635 100644 --- a/website/docs/docs/build/cumulative-metrics.md +++ b/website/docs/docs/build/cumulative-metrics.md @@ -8,6 +8,12 @@ tags: [Metrics, Semantic Layer] Cumulative metrics aggregate a measure over a given window. If no window is specified, the window is considered infinite and accumulates values over all time. +:::info MetricFlow time spine required + +You will need to create the [time spine model](/docs/build/metricflow-time-spine) before you add cumulative metrics. + +::: + ```yaml # Cumulative metrics aggregate a measure over a given window. The window is considered infinite if no window parameter is passed (accumulate the measure over all time) metrics: @@ -24,7 +30,7 @@ metrics: ### Window options -This section details examples for when you specify and don't specify window options. +This section details examples of when you specify and don't specify window options. @@ -56,7 +62,7 @@ metrics: window: 7 days ``` -From the sample yaml above, note the following: +From the sample YAML above, note the following: * `type`: Specify cumulative to indicate the type of metric. * `type_params`: Specify the measure you want to aggregate as a cumulative metric. You have the option of specifying a `window`, or a `grain to date`. @@ -142,7 +148,7 @@ metrics: ```yaml metrics: name: revenue_monthly_grain_to_date #For this metric, we use a monthly grain to date - description: Monthly revenue using a grain to date of 1 month (think of this as a monthly resetting point) + description: Monthly revenue using grain to date of 1 month (think of this as a monthly resetting point) type: cumulative type_params: measures: diff --git a/website/docs/docs/build/incremental-models.md b/website/docs/docs/build/incremental-models.md index 28345ba1873..29c7c8c585f 100644 --- a/website/docs/docs/build/incremental-models.md +++ b/website/docs/docs/build/incremental-models.md @@ -57,6 +57,7 @@ from raw_app_data.events {% if is_incremental() %} -- this filter will only be applied on an incremental run + -- (uses > to include records whose timestamp occurred since the last run of this model) where event_time > (select max(event_time) from {{ this }}) {% endif %} @@ -137,6 +138,7 @@ from raw_app_data.events {% if is_incremental() %} -- this filter will only be applied on an incremental run + -- (uses >= to include records arriving later on the same day as the last run of this model) where date_day >= (select max(date_day) from {{ this }}) {% endif %} diff --git a/website/docs/docs/build/measures.md b/website/docs/docs/build/measures.md index f85d21e3dcb..3483e4d823d 100644 --- a/website/docs/docs/build/measures.md +++ b/website/docs/docs/build/measures.md @@ -15,6 +15,27 @@ Measures are aggregations performed on columns in your model. They can be used a | [`agg`](#aggregation) | dbt supports the following aggregations: `sum`, `max`, `min`, `count_distinct`, and `sum_boolean`. | Required | | [`expr`](#expr) | You can either reference an existing column in the table or use a SQL expression to create or derive a new one. | Optional | | [`non_additive_dimension`](#non-additive-dimensions) | Non-additive dimensions can be specified for measures that cannot be aggregated over certain dimensions, such as bank account balances, to avoid producing incorrect results. | Optional | +| [`agg_params`] | specific aggregation properties such as a percentile. | [Optional]| +| [`agg_time_dimension`] | The time field. Defaults to the default agg time dimension for the semantic model. | [Optional] | +| [`non_additive_dimension`] | Use these configs when you need non-additive dimensions. | [Optional]| +| [`label`] | How the metric appears in project docs and downstream integrations. | [Required]| + + +## Measure spec + +An example of the complete YAML measures spec is below. The actual configuration of your measures will depend on the aggregation you're using. + +```bash +measures: + - name: The name of the measure # think transaction_total. If `expr` isn't present then this is the expected name of the column [Required] + description: same as always [Optional] + agg: the aggregation type. #think average, sum, max, min, etc.[Required] + expr: the field # think transaction_total or some other name you might want to alias [Optional] + agg_params: specific aggregation properties such as a percentile [Optional] + agg_time_dimension: The time field. Defaults to the default agg time dimension for the semantic model. [Optional] + non_additive_dimension: Use these configs when you need non-additive dimensions. [Optional] + label: How the metric appears in project docs and downstream integrations. [Required] +``` ### Name @@ -62,7 +83,7 @@ If you use the `dayofweek` function in the `expr` parameter with the legacy Snow ```yaml semantic_models: - name: transactions - description: A record for every transaction that takes place. Carts are considered multiple transactions for each SKU. + description: A record of every transaction that takes place. Carts are considered multiple transactions for each SKU. model: ref('schema.transactions') defaults: agg_time_dimensions: @@ -190,14 +211,14 @@ semantic_models: name: metric_time window_choice: min - name: mrr_end_of_month - description: Aggregate by summing all users active subscription plans at end of month + description: Aggregate by summing all users' active subscription plans at the end of month expr: subscription_value agg: sum non_additive_dimension: name: metric_time window_choice: max - name: mrr_by_user_end_of_month - description: Group by user_id to achieve each users MRR at the end of the month + description: Group by user_id to achieve each user's MRR at the end of the month expr: subscription_value agg: sum non_additive_dimension: diff --git a/website/docs/docs/build/metricflow-cli.md b/website/docs/docs/build/metricflow-cli.md new file mode 100644 index 00000000000..9e934e9ccca --- /dev/null +++ b/website/docs/docs/build/metricflow-cli.md @@ -0,0 +1,367 @@ +--- +title: MetricFlow CLI +id: metricflow-cli +description: "Query metrics and metadata in your dbt project with the metricflow cli" +sidebar_label: "MetricFlow CLI commands" +tags: [Metrics, Semantic Layer] +--- + +Once you define metrics in your dbt project, you can query metrics, dimensions, dimension values, and validate your configs using the MetricFlow command line (CLI). + +# Installation + +You can install the [MetricFlow CLI](https://github.com/dbt-labs/metricflow#getting-started) from [PyPI](https://pypi.org/project/dbt-metricflow/). You need to use `pip` to install the MetricFlow CLI on Windows or Linux operating systems: + +1. Create or activate your virtual environment.`python -m venv venv` +2. Run `pip install dbt-metricflow` + +The MetricFlow CLI is compatible with Python versions 3.8, 3.9, 3.10 and 3.11 + +# CLI commands + +The MetricFlow CLI provides the following commands to retrieve metadata and query metrics. + +To execute the commands, use the `mf` prefix before the command name. For example, to list all metrics, run `mf list metrics`: + +- [`list`](#list) — Retrieves metadata values. +- [`list metrics`](#list-metrics) — Lists metrics with dimensions. +- [`list dimensions`](#list) — Lists unique dimensions for metrics. +- [`list dimension-values`](#list-dimension-values) — List dimensions with metrics. +- [`list entities`](#list-entities) — Lists all unique entities. +- [`validate-configs`](#validate-configs) — Validates semantic model configurations. +- [`health-checks`](#health-checks) — Performs data platform health check. +- [`tutorial`](#tutorial) — Dedicated MetricFlow tutorial to help get you started. +- [`query`](#query) — Query metrics and dimensions you want to see in the CLI. Refer to [query examples](#query-examples) to help you get started. + +## List + +This command retrieves metadata values related to [Metrics](/docs/build/metrics-overview), [Dimensions](/docs/build/dimensions), and [Entities](/docs/build/entities) values. + + +## List metrics + +```bash +mf list + +This command lists the metrics with their available dimensions: + +```bash +mf list metrics +Options: + --search TEXT Filter available metrics by this search term + --show-all-dimensions Show all dimensions associated with a metric. + --help Show this message and exit. +``` + +## List dimensions + +This command lists all unique dimensions for a metric or multiple metrics. It displays only common dimensions when querying multiple metrics: + +```bash +mf list dimensions --metrics +Options: + --metrics SEQUENCE List dimensions by given metrics (intersection). Ex. + --metrics bookings,messages + --help Show this message and exit. +``` + +## List dimension-values + +This command lists all dimension values with the corresponding metric: + +```bash +mf list dimension-values --metrics --dimension +Options: + --dimension TEXT Dimension to query values from [required] + --metrics SEQUENCE Metrics that are associated with the dimension + [required] + --end-time TEXT Optional iso8601 timestamp to constraint the end time of + the data (inclusive) + --start-time TEXT Optional iso8601 timestamp to constraint the start time + of the data (inclusive) + --help Show this message and exit. +``` +## List entities + +This command lists all unique entities: + +```bash +mf list entities --metrics +Options: + --metrics SEQUENCE List entities by given metrics (intersection). Ex. + --metrics bookings,messages + --help Show this message and exit. +``` + +## Validate-configs + +This command performs validations against the defined semantic model configurations: + +```bash +mf validate-configs +Options: + --dw-timeout INTEGER Optional timeout for data warehouse + validation steps. Default None. + --skip-dw If specified, skips the data warehouse + validations + --show-all If specified, prints warnings and future- + errors + --verbose-issues If specified, prints any extra details + issues might have + --semantic-validation-workers INTEGER + Optional. Uses the number of workers + specified to run the semantic validations. + Should only be used for exceptionally large + configs + --help Show this message and exit. +``` + +## Health checks + +This command performs a health check against the data platform you provided in the configs: + +```bash +mf health-checks +``` + +## Tutorial + +Follow the dedicated MetricFlow tutorial to help you get started: + +```bash +mf tutorial +``` + +## Query + +Create a new query with MetricFlow, execute that query against the user's data platform, and return the result: + +```bash +mf query --metrics --group-by +Options: + --metrics SEQUENCE Metrics to query for: syntax is --metrics bookings + or for multiple metrics --metrics bookings,messages + --group-by SEQUENCE Dimensions and/or entities to group by: syntax is + --group-by ds or for multiple group bys --group-by + ds,org + --end-time TEXT Optional iso8601 timestamp to constraint the end + time of the data (inclusive) + --start-time TEXT Optional iso8601 timestamp to constraint the start + time of the data (inclusive) + --where TEXT SQL-like where statement provided as a string. For + example: --where "revenue > 100" + --limit TEXT Limit the number of rows out using an int or leave + blank for no limit. For example: --limit 100 + --order SEQUENCE Metrics or group bys to order by ("-" prefix for + DESC). For example: --order -ds or --order + ds,-revenue + --csv FILENAME Provide filepath for data frame output to csv + --explain In the query output, show the query that was + executed against the data warehouse + --show-dataflow-plan Display dataflow plan in explain output + --display-plans Display plans (e.g. metric dataflow) in the browser + --decimals INTEGER Choose the number of decimal places to round for + the numerical values + --show-sql-descriptions Shows inline descriptions of nodes in displayed SQL + --help Show this message and exit. + ``` + + +## Query examples + +The following tabs presents various different types of query examples that you can use to query metrics and dimensions. Select the tab that best suits your needs: + + + + + +**Example 1** — Use the example to query metrics by dimension and return the `order_amount` metric by `metric_time.` + +**Query** +```bash +mf query --metrics order_amount --group-by metric_time +``` + +**Result** +```bash +✔ Success 🦄 - query completed after 1.24 seconds +| METRIC_TIME | ORDER_AMOUNT | +|:--------------|---------------:| +| 2017-06-16 | 792.17 | +| 2017-06-17 | 458.35 | +| 2017-06-18 | 490.69 | +| 2017-06-19 | 749.09 | +| 2017-06-20 | 712.51 | +| 2017-06-21 | 541.65 | +``` + + + + +**Example 2** — You can include multiple dimensions in a query. For example, you can group by the `is_food_order` dimension to confirm if orders were for food or not. + +**Query** +```bash +mf query --metrics order_amount --group-by metric_time, is_food_order +``` + +**Result** +```bash + Success 🦄 - query completed after 1.70 seconds +| METRIC_TIME | IS_FOOD_ORDER | ORDER_AMOUNT | +|:--------------|:----------------|---------------:| +| 2017-06-16 | True | 499.27 | +| 2017-06-16 | False | 292.90 | +| 2017-06-17 | True | 431.24 | +| 2017-06-17 | False | 27.11 | +| 2017-06-18 | True | 466.45 | +| 2017-06-18 | False | 24.24 | +| 2017-06-19 | False | 300.98 | +| 2017-06-19 | True | 448.11 | +``` + + + + + + +**Example 3** — You can add order and limit functions to filter and present the data in a readable format. The following query limits the data set to 10 records and orders them by `metric_time`, descending. + +**Query** +```bash +mf query --metrics order_amount --group-by metric_time,is_food_order --limit 10 --order -metric_time +``` + +**Result** +```bash +✔ Success 🦄 - query completed after 1.41 seconds +| METRIC_TIME | IS_FOOD_ORDER | ORDER_AMOUNT | +|:--------------|:----------------|---------------:| +| 2017-08-31 | True | 459.90 | +| 2017-08-31 | False | 327.08 | +| 2017-08-30 | False | 348.90 | +| 2017-08-30 | True | 448.18 | +| 2017-08-29 | True | 479.94 | +| 2017-08-29 | False | 333.65 | +| 2017-08-28 | False | 334.73 | +``` + + + + +**Example 4** — You can further filter the data set by adding a `where` clause to your query. + +**Query** +```bash + mf query --metrics order_amount --group-by metric_time,is_food_order --limit 10 --order -metric_time --where "is_food_order = True" +``` + +**Result** +```bash + ✔ Success 🦄 - query completed after 1.06 seconds +| METRIC_TIME | IS_FOOD_ORDER | ORDER_AMOUNT | +|:--------------|:----------------|---------------:| +| 2017-08-31 | True | 459.90 | +| 2017-08-30 | True | 448.18 | +| 2017-08-29 | True | 479.94 | +| 2017-08-28 | True | 513.48 | +| 2017-08-27 | True | 568.92 | +| 2017-08-26 | True | 471.95 | +| 2017-08-25 | True | 452.93 | +| 2017-08-24 | True | 384.40 | +| 2017-08-23 | True | 423.61 | +| 2017-08-22 | True | 401.91 | +``` + + + + + +**Example 5** — To filter by time, there are dedicated start and end time options. Using these options to filter by time allows MetricFlow to further optimize query performance by pushing down the where filter when appropriate. + +**Query** +```bash + mf query --metrics order_amount --group-by metric_time,is_food_order --limit 10 --order -metric_time --where "is_food_order = True" --start-time '2017-08-22' --end-time '2017-08-27' +``` + + **Result** +```bash +✔ Success 🦄 - query completed after 1.53 seconds +| METRIC_TIME | IS_FOOD_ORDER | ORDER_AMOUNT | +|:--------------|:----------------|---------------:| +| 2017-08-27 | True | 568.92 | +| 2017-08-26 | True | 471.95 | +| 2017-08-25 | True | 452.93 | +| 2017-08-24 | True | 384.40 | +| 2017-08-23 | True | 423.61 | +| 2017-08-22 | True | 401.91 | +``` + + + + + + + +### Additional query examples + +The following tabs presents additional query examples, like exporting to a CSV. Select the tab that best suits your needs: + + + + + + + +**Example 6** — Add `--explain` to your query to view the SQL generated by MetricFlow. + +**Query** + +```bash + mf query --metrics order_amount --group-by metric_time,is_food_order --limit 10 --order -metric_time --where "is_food_order = True" --start-time '2017-08-22' --end-time '2017-08-27' --explain +``` + + **Result** + ```bash + ✔ Success 🦄 - query completed after 0.28 seconds +🔎 SQL (remove --explain to see data or add --show-dataflow-plan to see the generated dataflow plan): +SELECT + metric_time + , is_food_order + , SUM(order_cost) AS order_amount +FROM ( + SELECT + cast(ordered_at as date) AS metric_time + , is_food_order + , order_cost + FROM ANALYTICS.js_dbt_sl_demo.orders orders_src_1 + WHERE cast(ordered_at as date) BETWEEN CAST('2017-08-22' AS TIMESTAMP) AND CAST('2017-08-27' AS TIMESTAMP) +) subq_3 +WHERE is_food_order = True +GROUP BY + metric_time + , is_food_order +ORDER BY metric_time DESC +LIMIT 10 +``` + + + + + +**Example 7** — Add the `--csv file_name.csv` flag to export the results of your query to a csv. + +**Query** + +```bash +mf query --metrics order_amount --group-by metric_time,is_food_order --limit 10 --order -metric_time --where "is_food_order = True" --start-time '2017-08-22' --end-time '2017-08-27' --csv query_example.csv +``` + +**Result** +```bash +✔ Success 🦄 - query completed after 0.83 seconds +🖨 Successfully written query output to query_example.csv +``` + + + diff --git a/website/docs/docs/build/metrics-overview.md b/website/docs/docs/build/metrics-overview.md index e7271ecf417..6375054ed5c 100644 --- a/website/docs/docs/build/metrics-overview.md +++ b/website/docs/docs/build/metrics-overview.md @@ -16,6 +16,21 @@ The keys for metrics definitions are: * `constraint`: For any type of metric, you may optionally include a constraint string, which applies a dimensional filter when computing the metric. You may think of this as your WHERE clause. * `meta`: Additional metadata you want to add to your metric. +Here's a complete example of the metrics spec configuration: + +``` +metrics: + - name: metric name + description: same as always + type: the type of the metric + type_params: + - specific properties for the metric type + configs: here for `enabled` + label: The display name for your metric. This value will be shown in downstream tools. + filter: | + {{ dimension('name') }} > 0 and {{ dimension(' another name') }} is not null + +``` This page explains the different supported metric types you can add to your dbt project. - ### Derived metrics [Derived metrics](/docs/build/derived) are defined as an expression of other metrics. Derived metrics allow you to do calculations on top of metrics. @@ -145,7 +159,9 @@ You can set more metadata for your metrics, which can be used by other tools lat ## Related docs - [Semantic models](/docs/build/semantic-models) +- [Cumulative](/docs/build/cumulative) - [Derived](/docs/build/derived) + diff --git a/website/docs/docs/build/semantic-models.md b/website/docs/docs/build/semantic-models.md index 28fccaddb72..a304944a440 100644 --- a/website/docs/docs/build/semantic-models.md +++ b/website/docs/docs/build/semantic-models.md @@ -140,11 +140,11 @@ You can refer to entities (join keys) in a semantic model using the `name` param MetricFlow simplifies this by allowing you to query all metric groups and construct the join during the query. To specify dimensions parameters, include the `name` (either a column or SQL expression) and `type` (`categorical` or `time`). Categorical groups represent qualitative values, while time groups represent dates of varying granularity. -dimensions are identified using the name parameter, just like identifiers. The naming of groups must be unique within a semantic model, but not across semantic models since MetricFlow, uses entities to determine the appropriate groups. +Dimensions are identified using the name parameter, just like identifiers. The naming of groups must be unique within a semantic model, but not across semantic models since MetricFlow, uses entities to determine the appropriate groups. :::info For time groups -For semantic models with a measure, you must have a primary time group. +For semantic models with a measure, you must have a [primary time group](/docs/build/dimensions#time). ::: diff --git a/website/docs/docs/build/sl-getting-started.md b/website/docs/docs/build/sl-getting-started.md index ff0e6006921..5b8a40904f5 100644 --- a/website/docs/docs/build/sl-getting-started.md +++ b/website/docs/docs/build/sl-getting-started.md @@ -20,7 +20,7 @@ This getting started page recommends a workflow to help you get started creating - Have a dbt project connected to Snowflake or Postgres. * Note: Support for BigQuery, Databricks, and Redshift coming soon. - Have an understanding of key concepts in [MetricFlow](/docs/build/about-metricflow), which powers the revamped dbt Semantic Layer. -- Recommended — dbt Labs recommends you install the [MetricFlow CLI package](https://github.com/dbt-labs/metricflow) to test your metrics. +- Recommended — Install the [MetricFlow CLI](/docs/build/metricflow-cli) to query and test your metrics. :::tip New to dbt or metrics? Try our [Jaffle shop example project](https://github.com/dbt-labs/jaffle-sl-template) to help you get started! @@ -108,7 +108,7 @@ Interact and test your metric using the CLI before committing it to your MetricF Follow these steps to test and query your metrics using MetricFlow: -1. If you haven't done so already, make sure you [install MetricFlow](#install-metricflow). +1. If you haven't done so already, make sure you [install MetricFlow](#install-metricflow). Refer to [MetricFlow CLI](/docs/build/metricflow-cli) for more info on commands and how to install the CLI. 2. Run `mf --help` to confirm you have MetricFlow installed, and to see the available commands. If you don't have the CLI installed, run `pip install --upgrade "dbt-metricflow[your_adapter_name]"`. For example, if you have a Snowflake adapter, run `pip install --upgrade "dbt-metricflow[snowflake]"`. @@ -130,3 +130,4 @@ ANY COMMON TROUBLESHOOTING QUESTIONS?--> - [About MetricFlow](/docs/build/about-metricflow) - [Semantic models](/docs/build/semantic-models) - [Metrics](/docs/build/metrics-overview) +- [MetricFlow CLI](/docs/build/metricflow-cli) diff --git a/website/docs/docs/cloud/about-cloud/browsers.md b/website/docs/docs/cloud/about-cloud/browsers.md index 4a04f70171b..2fc5a8b4b4d 100644 --- a/website/docs/docs/cloud/about-cloud/browsers.md +++ b/website/docs/docs/cloud/about-cloud/browsers.md @@ -22,3 +22,8 @@ You may still be able to access and use dbt Cloud even without using the latest To improve your experience using dbt Cloud, we suggest that you turn off ad blockers. ::: +### Browser sessions + +A session is a period of time during which you’re signed in to a dbt Cloud account from a browser. If you close your browser, it will end your session and log you out. You'll need to log in again the next time you try to access dbt Cloud. + +If you've logged in using [SSO](/docs/cloud/manage-access/sso-overview) or [OAuth](/docs/cloud/git/connect-github#personally-authenticate-with-github), you can customize your maximum session duration, which might vary depending on your identity provider (IdP). diff --git a/website/docs/faqs/Docs/modify-owner-column.md b/website/docs/faqs/Docs/modify-owner-column.md index db06e5af6cf..8395a182bb9 100644 --- a/website/docs/faqs/Docs/modify-owner-column.md +++ b/website/docs/faqs/Docs/modify-owner-column.md @@ -8,7 +8,7 @@ id: modify-owner-column Due to the nature of the field, you won't be able to change the owner column in your generated documentation. -The _owner_ field in `dbt-docs` is pulled from database metdata (`catalog.json`), meaning the owner of that table in the database. With the exception of exposures, it's not pulled from an `owner` field set within dbt. +The _owner_ field in `dbt-docs` is pulled from database metadata (`catalog.json`), meaning the owner of that table in the database. With the exception of exposures, it's not pulled from an `owner` field set within dbt. Generally, dbt's database user owns the tables created in the database. Source tables are usually owned by the service responsible for ingesting/loading them. diff --git a/website/docs/guides/legacy/debugging-schema-names.md b/website/docs/guides/legacy/debugging-schema-names.md index 12daacb1f2d..6c869b5f8af 100644 --- a/website/docs/guides/legacy/debugging-schema-names.md +++ b/website/docs/guides/legacy/debugging-schema-names.md @@ -16,7 +16,7 @@ You can also follow along via this video: Do a file search to check if you have a macro named `generate_schema_name` in the `macros` directory of your project. #### I do not have a macro named `generate_schema_name` in my project -This means that you are using dbt's default implementation of the macro, as defined [here](https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/get_custom_name/get_custom_schema.sql#L17-L30) +This means that you are using dbt's default implementation of the macro, as defined [here](https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/get_custom_name/get_custom_schema.sql#L47C1-L60) ```sql {% macro generate_schema_name(custom_schema_name, node) -%} diff --git a/website/docs/reference/snowflake-permissions.md b/website/docs/reference/snowflake-permissions.md index 80dbec25cc8..6a469d12230 100644 --- a/website/docs/reference/snowflake-permissions.md +++ b/website/docs/reference/snowflake-permissions.md @@ -15,9 +15,11 @@ grant usage on schema database.an_existing_schema to role role_name; grant create table on schema database.an_existing_schema to role role_name; grant create view on schema database.an_existing_schema to role role_name; grant usage on future schemas in database database_name to role role_name; +grant monitor on future schemas in database database_name to role role_name; grant select on future tables in database database_name to role role_name; grant select on future views in database database_name to role role_name; grant usage on all schemas in database database_name to role role_name; +grant monitor on all schemas in database database_name to role role_name; grant select on all tables in database database_name to role role_name; grant select on all views in database database_name to role role_name; ``` diff --git a/website/sidebars.js b/website/sidebars.js index bf992619dbc..c09e7b784c4 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -257,6 +257,7 @@ const sidebarSettings = { "docs/build/join-logic", "docs/build/validation", "docs/build/metricflow-time-spine", + "docs/build/metricflow-cli", ] }, "docs/build/sl-getting-started", @@ -275,6 +276,7 @@ const sidebarSettings = { label: "Metrics", link: { type: "doc", id: "docs/build/metrics-overview"}, items: [ + "docs/build/cumulative", "docs/build/derived", "docs/build/ratio", "docs/build/simple", diff --git a/website/static/img/blog/2023-07-17-GPT-and-dbt-test/image1.png b/website/static/img/blog/2023-07-17-GPT-and-dbt-test/image1.png new file mode 100644 index 00000000000..687bdef7568 Binary files /dev/null and b/website/static/img/blog/2023-07-17-GPT-and-dbt-test/image1.png differ diff --git a/website/static/img/blog/2023-07-17-GPT-and-dbt-test/image2.png b/website/static/img/blog/2023-07-17-GPT-and-dbt-test/image2.png new file mode 100644 index 00000000000..658e4c0cfb5 Binary files /dev/null and b/website/static/img/blog/2023-07-17-GPT-and-dbt-test/image2.png differ diff --git a/website/static/img/blog/2023-07-17-GPT-and-dbt-test/image3.png b/website/static/img/blog/2023-07-17-GPT-and-dbt-test/image3.png new file mode 100644 index 00000000000..fa4b837a82f Binary files /dev/null and b/website/static/img/blog/2023-07-17-GPT-and-dbt-test/image3.png differ diff --git a/website/static/img/blog/authors/pedro_brito.jpeg b/website/static/img/blog/authors/pedro_brito.jpeg new file mode 100644 index 00000000000..9f163a431f3 Binary files /dev/null and b/website/static/img/blog/authors/pedro_brito.jpeg differ