From 3de6663699cd32410a5992ce9c95a3ae2e2fb6de Mon Sep 17 00:00:00 2001
From: lucia-vargas-a <alvargasa@gmail.com>
Date: Mon, 18 Mar 2024 21:41:00 +0100
Subject: [PATCH] Update Retention docs.

---
 src/cookbooks/retention.md | 79 ++++++++++++--------------------------
 1 file changed, 25 insertions(+), 54 deletions(-)

diff --git a/src/cookbooks/retention.md b/src/cookbooks/retention.md
index ff9169b83..1488d088b 100644
--- a/src/cookbooks/retention.md
+++ b/src/cookbooks/retention.md
@@ -2,56 +2,55 @@ _Authored by the Data Science Team. Please direct questions/concerns to Jesse Mc
 
 # Retention
 
-Retention measures the proportion of users that are _continuing_ to use Firefox, making it one of the more important metrics we track - we generally expect that the more value our users receive from the product, the more likely they are to be retained. We commonly measure retention between releases, experiment cohorts, and various Firefox subpopulations to better understand how a change to the user experience or use of a specific feature affects retention behavior.
+Retention measures the proportion of users that _continue_ to use any of the browsers after sending the first ping, making it one of the most important metrics that we track at Mozilla - the expectation is that the more value our users receive from a product, the more likely they are to be retained. We commonly measure retention between releases, experiment cohorts, and various Firefox subpopulations to better understand how a change to the user experience or use of a specific feature affects the retention behavior.
 
-## State of Retention
+### References
+- Initial [Proposed Metric Definition: Retention](https://docs.google.com/document/d/1VtqNFQFB9eJNr57h3Mz-lldMcpSYQKHVn2jzMMjPFYY/).
 
-There is currently active research into retention metrics and we expect our conceptual and data model around retention to evolve in the coming months. This page will be updated. However, in the meantime, we want to be clear about our standard retention metrics for use right now.
+- Proposed [Revision of Funnel Metrics](https://docs.google.com/document/d/18QGa4JYbDP35IywH3zGftCgejrd8aGnbft_L2oxp1Ao/edit#heading=h.rl9rub54j6oj) implemented in H2-2023.
 
-You can also see the [_Proposed Metric Definition: Retention_ Google Doc](https://docs.google.com/document/d/1VtqNFQFB9eJNr57h3Mz-lldMcpSYQKHVn2jzMMjPFYY/) for a summary of our conceptual thinking about retention metrics.
+## Metric definitions
 
-## Standard Retention metrics
+### Repeat First Month Users
 
-Note that the definitions below refer to "usage criterion". See the [GUD Data Model documentation](https://docs.google.com/document/d/1sIHCCaJhtfxj-dnbInfuIjlMRhCFbEhFiBESaezIRwM/edit#heading=h.ysqpvceb7pgt) for more information. For normal Firefox Desktop retention, the usage criterion refers to simply sending any main ping.
+New profiles who used the browser more than one day in their first 28-day window. The inclusion of this metric guarantees us monotonically decreasing flow (that week 4 retention is either smaller than or equal to multi-day users) and also has backward compatibility with week 4 retention (every member of the 4 week retention cohort is a member of the multi-day users cohort). It also covers more users than activation to make the opportunity size valuable than activation.
 
-### 1-Week Retention
+### Week 4 Retention
 
-Among profiles that were active in the specified usage criterion at least once in the week starting on the specified day (day 0), what proportion (out of 1) meet the usage criterion during the following week (days 7 through 13).
+Used the browser at least once between days 22-28.
 
-### 1-Week New Profile Retention
+### Activated User
 
-Among new profiles created on the day specified, what proportion (out of 1) meet the usage criterion during the week beginning one week after the day specified.
-
-Note that we use a new profile definition that relies on the `profile_creation_date` and requires that a main ping be sent within one week of the `profile_creation_date`. This differs from analysis using new profile pings, but allows valid comparison over time. New profile pings do not allow comparison over time due to the increased adoption of versions of the browser recent enough to send new profile pings.
+For Desktop, a new user becomes an activated user when they have used the browser at least 5 times in their first 7 days. For mobile, a new user becomes an activated user when they have used the browser for at least 3 days (including the first day of app open) in their first week and performed a search in the latter half of their first week.
 
 ## Accessing Retention Metrics
 
 There are three standard methods for accessing retention metrics. These methods trade off between simplicity and flexibility.
 
-### Mozilla Growth & Usage Dashboard (GUD)
+### Funnel Dashboards with Retention analysis
+
+- Mobile retention can be analyzed in the [Fenix Funnel](https://mozilla.cloud.looker.com/dashboards/1470) and [iOS Funnel](https://mozilla.cloud.looker.com/dashboards/1314?Country=&YoY+control+%28Do+not+change%29+=before+28+days+ago&Date=2023%2F01%2F01+to+2023%2F10%2F18) dashboards.
 
-The [GUD](https://gud.telemetry.mozilla.org/) provides plots and exportable tables of both retention metrics over time. Metrics are available for most products and can be sliced by OS, language, country, and channel.
+- Desktop retention is available in the [Desktop Moz.org Funnel (Windows)](https://mozilla.cloud.looker.com/dashboards/duet::desktop_moz_org_funnel_windows?Analysis%20Period=90%20day&Countries=US,GB,DE,FR,CA,BR,MX,CN,IN,AU,NL,ES,RU,ROW&Include%20Dates%20Where=data%20complete).
 
-### Querying Smoot Usage Tables
+### Querying Aggregate Tables
 
-For programmatic access, the tables underlying GUD can be queried directly. For example:
+For programmatic access, the views underlying the dashboards can be queried directly. For example:
 
 ```sql
 SELECT
-  `date`,
-  SAFE_DIVIDE(SUM(new_profile_active_in_week_1), SUM(new_profiles)) AS one_week_new_profile_retention,
-  SAFE_DIVIDE(SUM(active_in_weeks_0_and_1), SUM(active_in_week_0)) AS one_week_retention
-FROM `moz-fx-data-shared-prod.telemetry.smoot_usage_day_13`
-WHERE
-  usage = 'Any Firefox Desktop Activity'
-  AND country IN ('US', 'GB', 'CA', 'FR', 'DE')
-  AND `date` BETWEEN "2019-11-01" AND "2019-11-07"
-GROUP BY `date` ORDER BY `date`
+  first_seen_date AS submission_date,
+  country_code,
+  SUM(CASE WHEN qualified_week4 = TRUE THEN 1 ELSE 0 END) AS retained_week4
+FROM `mozdata.telemetry.clients_first_seen_28_days_later`
+WHERE first_seen_date >= '2024-01-01'
+  AND DATE_DIFF(current_date(), first_seen_date, DAY) > 1
+GROUP BY 1, 2;
 ```
 
 ### Querying Clients Daily Tables
 
-For more custom access, use the `clients_last_seen tables`. You can restrict to an arbitrary population of users by joining the `base` table below against a table containing the `client_id`s of interest.
+Another option for more custom access, e.g. to see retention in the first week, is to use the `clients_last_seen tables`. You can restrict to an arbitrary population of users by joining the `base` table below against a table containing the `client_id`s of interest.
 
 ```sql
 WITH base AS (
@@ -103,34 +102,6 @@ When performing retention analysis it is important to understand that there are
 
 It is good practice to always compute confidence intervals for retention metrics, especially when looking at specific slices of users or when making comparisons between different groups.
 
-The [Growth and Usage Dashboard](https://gud.telemetry.mozilla.org/) provides confidence intervals automatically using a jackknife resampling method over `client_id` buckets. This confidence intervals generated using this method should be considered the "standard". We show below how to compute them using the data sources described above. These methods use UDFs [defined in bigquery-etl](https://github.com/mozilla/bigquery-etl/blob/master/sql/moz-fx-data-shared-prod/udf_js/jackknife_ratio_ci/udf.sql).
-
-We also note that it is fairly simple to calculate a confidence interval using any statistical method appropriate for proportions. The queries given above provide both numerators and denominators, so feel free to calculate confidence intervals in the manner you prefer. However, if you want to replicate the standard confidence intervals, please work from the example queries below.
-
-### Querying Smoot Usage Tables
-
-```sql
-WITH bucketed AS (
-  SELECT
-    `date`,
-    id_bucket,
-    SUM(new_profile_active_in_week_1) AS new_profile_active_in_week_1,
-    SUM(new_profiles) AS new_profiles
-  FROM `moz-fx-data-shared-prod.telemetry.smoot_usage_day_13`
-  WHERE
-    usage = 'Any Firefox Desktop Activity'
-    AND country IN ('US', 'GB', 'CA', 'FR', 'DE')
-    AND `date` BETWEEN "2019-11-01" AND "2019-11-07"
-  GROUP BY `date`, id_bucket
-)
-
-SELECT
-  `date`,
-  udf_js.jackknife_ratio_ci(20, ARRAY_AGG(STRUCT(CAST(new_profile_active_in_week_1 AS float64), CAST(new_profiles as FLOAT64)))) AS one_week_new_profile_retention
-FROM bucketed
-GROUP BY `date` ORDER BY `date`
-```
-
 ### Querying Clients Daily Tables
 
 ```sql