Add how to measure the savings.

mozilla · Jun 20, 2024 · 7cf8b76 · 7cf8b76
1 parent 27dd18f
commit 7cf8b76
Show file tree

Hide file tree

Showing 2 changed files with 28 additions and 5 deletions.
diff --git a/src/cookbooks/data_modeling/looker_cost_saving.png b/src/cookbooks/data_modeling/looker_cost_saving.png
diff --git a/src/cookbooks/data_modeling/using_aggregates.md b/src/cookbooks/data_modeling/using_aggregates.md
@@ -10,7 +10,10 @@ This doc is about when to use different options to aggregate data, their limitat
 
 1. **[BigQuery Materialized Views](https://cloud.google.com/bigquery/docs/materialized-views-intro).** These are views defined by the developer and then created, managed and scheduled by BigQuery. Materialized views are periodically updated reading only changes from the based table to compute results. Materialized view definitions [_do not support_ certain BigQuery features and expressions](https://cloud.google.com/bigquery/docs/materialized-views-intro#limitations), such as UDFs, certain aggregate functions, backfilling or nesting.
 
-   - Example to create a Materialized view in BigQuery.
+   - [Template to create a Materialized View](https://console.cloud.google.com/bigquery?ws=!1m7!1m6!12m5!1m3!1smozdata!2sus-central1!3s8403c62c-e243-4e57-8d91-5c1fcdf26828!2e1).
+
+   - Example.
+
      ```
      CREATE MATERIALIZED VIEW `moz-fx-data-shared-prod.monitoring_derived.suggest_click_rate_live_v1`
      OPTIONS
@@ -37,8 +40,8 @@ This doc is about when to use different options to aggregate data, their limitat
 1. **[Looker PDTs & aggregate awareness](https://cloud.google.com/looker/docs/aggregate_awareness)** These are aggregations that a developer defines in an Explore file (`explore.lkml`). From this definition, Looker creates a table in BigQuery's `mozdata.tmp` using the naming convention`scratch schema + table status code + hash value + view name` and runs the scheduled update of the data.
    Looker's PDTs and aggregate awareness are **only** referenced in Looker when at least one of the columns is used in a Looker object. These aggregates can be particularly beneficial to avoid having to rebuild dashboards after a schema change.
 
-   - Template to create aggregate awareness in a Looker Explore:
-
+   - Template to create aggregate awareness in a Looker Explore, replacing the text inside <> with the actual values:
+     
      ```
      aggregate_table: <aggregate_name: Descriptive name of this aggregation.> {
        query: {
@@ -52,8 +55,8 @@ This doc is about when to use different options to aggregate data, their limitat
        increment_offset: <INT: number of periods to update, recommended is 1.> }
        }
      ```
-
-   - An [Example of aggregate awareness in a Looker Explore](https://mozilla.cloud.looker.com/projects/spoke-default/files/combined_browser_metrics/explores/active_users_aggregates.explore.lkml)
+     
+   - [Example of aggregate awareness in a Looker Explore](https://mozilla.cloud.looker.com/projects/spoke-default/files/combined_browser_metrics/explores/active_users_aggregates.explore.lkml)
 
 ## Important considerations:
 
@@ -99,3 +102,23 @@ This doc is about when to use different options to aggregate data, their limitat
 - The metrics defined in the Explore use only any of these data types: NUMBER, DATE, STRING or YESNO.
 - The aggregate uses a DISTINCT COUNT **and** the query matches exactly the Explore query.
 - The base table for the Explore is expected to change with added columns and Looker Explore will require modifications that also require re-creating the dashboards. When using aggregate awareness this re-create is **not** neccesary.
+
+## What is the actual benefit?
+
+- Looker displays in the top right corner of a view or explore the amount of data that will be processed with and withouth using the aggregates.
+![Looker cost saving](looker_cost_saving.png)
+- BigQuery displays also in the top right corner of the window, the amount of data to be scanned by a query. Alternatively, it's possible to query the information schema to return the bytes processed and cost. With this information is possible to compare and calculate the savings that result from using a materialized view.
+
+   ```
+    SELECT destination_table.project_id AS project_id,
+         destination_table.dataset_id AS dataset,
+         SUBSTR(destination_table.table_id, 0, INSTR(destination_table.table_id, '$') -1) AS table_id,
+         SUM(total_bytes_processed/(1024*1024*1024)) as TB_processed,
+         SUM((total_slot_ms * 0.06) / (60 * 60 * 1000)) * 0.76 AS cost
+    FROM `moz-fx-data-shared-prod`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
+    WHERE EXTRACT(DATE FROM  creation_time) BETWEEN <ini_date> AND CURRENT_DATE
+         AND destination_table.dataset_id = <dataset_name>
+         AND user_email = <user_email>
+         AND destination_table.table_id = <table_name>
+    GROUP BY ALL;
+   ```