Skip to content

Commit

Permalink
Add how to measure the savings.
Browse files Browse the repository at this point in the history
  • Loading branch information
lucia-vargas-a committed Jun 20, 2024
1 parent 27dd18f commit 7cf8b76
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 5 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
33 changes: 28 additions & 5 deletions src/cookbooks/data_modeling/using_aggregates.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,10 @@ This doc is about when to use different options to aggregate data, their limitat

1. **[BigQuery Materialized Views](https://cloud.google.com/bigquery/docs/materialized-views-intro).** These are views defined by the developer and then created, managed and scheduled by BigQuery. Materialized views are periodically updated reading only changes from the based table to compute results. Materialized view definitions [_do not support_ certain BigQuery features and expressions](https://cloud.google.com/bigquery/docs/materialized-views-intro#limitations), such as UDFs, certain aggregate functions, backfilling or nesting.

- Example to create a Materialized view in BigQuery.
- [Template to create a Materialized View](https://console.cloud.google.com/bigquery?ws=!1m7!1m6!12m5!1m3!1smozdata!2sus-central1!3s8403c62c-e243-4e57-8d91-5c1fcdf26828!2e1).

- Example.

```
CREATE MATERIALIZED VIEW `moz-fx-data-shared-prod.monitoring_derived.suggest_click_rate_live_v1`
OPTIONS
Expand All @@ -37,8 +40,8 @@ This doc is about when to use different options to aggregate data, their limitat
1. **[Looker PDTs & aggregate awareness](https://cloud.google.com/looker/docs/aggregate_awareness)** These are aggregations that a developer defines in an Explore file (`explore.lkml`). From this definition, Looker creates a table in BigQuery's `mozdata.tmp` using the naming convention`scratch schema + table status code + hash value + view name` and runs the scheduled update of the data.
Looker's PDTs and aggregate awareness are **only** referenced in Looker when at least one of the columns is used in a Looker object. These aggregates can be particularly beneficial to avoid having to rebuild dashboards after a schema change.
- Template to create aggregate awareness in a Looker Explore:
- Template to create aggregate awareness in a Looker Explore, replacing the text inside <> with the actual values:
```
aggregate_table: <aggregate_name: Descriptive name of this aggregation.> {
query: {
Expand All @@ -52,8 +55,8 @@ This doc is about when to use different options to aggregate data, their limitat
increment_offset: <INT: number of periods to update, recommended is 1.> }
}
```
- An [Example of aggregate awareness in a Looker Explore](https://mozilla.cloud.looker.com/projects/spoke-default/files/combined_browser_metrics/explores/active_users_aggregates.explore.lkml)
- [Example of aggregate awareness in a Looker Explore](https://mozilla.cloud.looker.com/projects/spoke-default/files/combined_browser_metrics/explores/active_users_aggregates.explore.lkml)
## Important considerations:
Expand Down Expand Up @@ -99,3 +102,23 @@ This doc is about when to use different options to aggregate data, their limitat
- The metrics defined in the Explore use only any of these data types: NUMBER, DATE, STRING or YESNO.
- The aggregate uses a DISTINCT COUNT **and** the query matches exactly the Explore query.
- The base table for the Explore is expected to change with added columns and Looker Explore will require modifications that also require re-creating the dashboards. When using aggregate awareness this re-create is **not** neccesary.
## What is the actual benefit?
- Looker displays in the top right corner of a view or explore the amount of data that will be processed with and withouth using the aggregates.
![Looker cost saving](looker_cost_saving.png)
- BigQuery displays also in the top right corner of the window, the amount of data to be scanned by a query. Alternatively, it's possible to query the information schema to return the bytes processed and cost. With this information is possible to compare and calculate the savings that result from using a materialized view.
```
SELECT destination_table.project_id AS project_id,
destination_table.dataset_id AS dataset,
SUBSTR(destination_table.table_id, 0, INSTR(destination_table.table_id, '$') -1) AS table_id,
SUM(total_bytes_processed/(1024*1024*1024)) as TB_processed,
SUM((total_slot_ms * 0.06) / (60 * 60 * 1000)) * 0.76 AS cost
FROM `moz-fx-data-shared-prod`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE EXTRACT(DATE FROM creation_time) BETWEEN <ini_date> AND CURRENT_DATE
AND destination_table.dataset_id = <dataset_name>
AND user_email = <user_email>
AND destination_table.table_id = <table_name>
GROUP BY ALL;
```

0 comments on commit 7cf8b76

Please sign in to comment.