final_report.qmd

---
title: "Visualising CHD Pathway Inequalities by ethnicity"
lang: en-GB
author: "Jacqueline Grout"
date: last-modified
date-format: "YYYY-MM-DD"
title-block-banner: "#f9bf07"
title-block-banner-color: "#333739"
format:
  html:
    self-contained: true
    grid:
      sidebar-width: 200px
      body-width: 950px
      margin-width: 150px
      gutter-width: 1.5rem
    embed-resources: true
    smooth-scroll: true
    theme: cosmo
    fontcolor: black
    toc: true
    toc-location: left
    toc-title: Contents
    toc-depth: 3
editor: visual
execute:
  echo: false
  message: false
  warning: false
  freeze: auto
editor_options: 
  chunk_output_type: console
  
css: styles.css
---

```{r}
#| echo: false
 
library(targets)
library(gt)
library(tidyverse)
library(grid)
library(gridExtra)
library(downloadthis)

```

# Introduction

In 2022, the Strategy Unit worked with the British Heart Foundation (BHF) to explore ways of visualising socio-economic inequalities as they emerge through the coronary heart disease (CHD) pathway. We made these visualisations available so that health and care staff can understand where on the pathway socio-economic inequalities emerge and at which points they are moderated or exacerbated. The report[^1] and web-based tool[^2] are available via the BHF and Strategy Unit websites.

[^1]: <https://www.strategyunitwm.nhs.uk/publications/socio-economic-inequalities-coronary-heart-disease>

[^2]: <https://www.bhf.org.uk/icb-tool>

Whilst this initial analysis was focused on inequalities between socio-economic groups, BHF asked the Strategy Unit if we might explore the feasibility of assessing inequalities across other dimensions of inequality. This report explores inequalities by ethnicity along the coronary heart disease (CHD) pathway.

The objectives of the report are to:

1.  Set out the methods by which the Strategy Unit have sought to represent CHD pathway inequalities by ethnicity.

2.  Quantify and illustrate the inequalities by ethnicity over the disease progression and treatment pathway for CHD.

The analysis has been conducted by the Strategy Unit on behalf of the British Heart Foundation.

# Inequities in healthcare

The term ‘inequities’ is used to describe unjustifiable differences in rates of access between subgroups. An equity analysis must control for levels of need within each population subgroup. Having done this, an equitable distribution of services is one where rates of access to a service or population follow the distribution of need, such that a patient with a given level of need in one subgroup has the same chance of accessing a service as their counterparts with a similar level of need in other subgroups. This is the standard that the NHS seeks to achieve. Assessing equity is challenging. Further detail about inequalities and inequities in healthcare can be found in a previous Strategy Unit report[^3]

[^3]: <https://www.strategyunitwm.nhs.uk/publications/socio-economic-inequalities-access-planned-hospital-care-causes-and-consequences>

In our previous work, visualising socio-economic inequalities, our units of analysis were GP practices. The metrics of interest, risk factors, primary prevention interventions, secondary care procedures, outcomes etc, are readily available at GP practice level and there are established methods of assigning GP practices to deciles of deprivation. Having calculated a metric value for each decile of GP practices, we used the relative index of inequality (RII) to estimate the scale and direction of inequality. The RII can only be used when the dimension of inequality can be expressed as a set of order groups.

Assessing inequalities across ethnic groups is more challenging, since unlike deprivation, ethnicity cannot be expresseed as a set of ordered groups. The distribution of patients across ethnic groups is unique to each practice. There is no meaningful way to numerically aggregate these distributions into a single variable (e.g. % BME) without significant loss of information. An alternative method of grouping practices according to the distribution of its patients over ethnic group was required. We have used k-medoids clustering to assign practices to one of a small number of groups such that practices within a group have similar distributions of patients over ethnic groups. K-medoids is an established, and commonly used unsupervised machine learning technique. This clustering approach required data on the ethnicity of a practices registered population. We imputed it based on the Lower Super Output Area (LSOA) of residence of GP registrants and the ethnicity distribution of patients living in these LSOAs using the Census 2021 data.

Ethnicity, unlike deprivation, is not an ordered variable. The approach in our previous work used the relative index of inequality to measure the degree of inequality. This measure relies on the ordered quality of the socio-economic deprivation variable. To handle the categorical nature of the ethnicity variable we have used the relative index of disparity[^4] to indicate the extent to which the rate of an activity or event varies across groups. The index estimates the proportion of events (e.g., admissions) that would need to be redistributed between clusters in order that event rates follow levels of need. Further detailed explanations of the methods used in the analysis can be found in the appendix.

[^4]: *Pearcy J, Keppel K, A Summary Measure of Health Disparity, Public Health Reports, Vol 117, May-June 2002*

    [*https://open.umich.edu/sites/default/files/downloads/PublicHealthRep-Pearcy.pdf*](https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopen.umich.edu%2Fsites%2Fdefault%2Ffiles%2Fdownloads%2FPublicHealthRep-Pearcy.pdf&data=05%7C02%7Cjacqueline.grout1%40nhs.net%7Ca9c82e4d3cc94a72de8408dc68333fc0%7C37c354b285b047f5b22207b48d774ee3%7C0%7C0%7C638499816545595235%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ddk0rdD6HnQ1524JNCsTfr3AoiBePESeckGAAnQPSA4%3D&reserved=0)

# CHD Metrics

This report quantifies and illustrates levels of inequity across 33 metrics at various points along the continuum of coronary heart disease progression and over a typical treatment pathway. They are shown in the table below grouped by domain (risk factors, risk factor identification, primary prevention, disease identification, secondary prevention, tertiary prevention, intermediate and full outcomes), which represent the various stages along the pathway. Full definitions and data sources for each metric are included in the appendix.

*Table 1 - coronary heart disease pathway metrics*

```{r}
library(gt)
metrics_table <- tibble(metric_domain = c("Need","Risk factors","Risk factor identification","Risk factor identification","Risk factor identification","Risk factor identification","Risk factor identification","Risk factor identification","Primary prevention","Primary prevention","Primary prevention","Primary prevention","Disease identification","Disease identification","Disease identification","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Intermediate outcome","Intermediate outcome","Intermediate outcome","Full outcomes","Full outcomes","Full outcomes","Full outcomes"),
  name_metric=c("CHD synthetic prevalence estimates",
                                      "Smoking synthetic prevalence estimates",
                                      "Smoking register",
                                      "Obesity register",
                                      "Diabetes register",
                                      "Depression register",
                                      "The percentage of patients aged 45 or over who have a record of blood pressure in the preceding 5 years",
                                      "CVD risk register",
                                      "Percentage of patients aged 18 and over with GP recorded CVD (narrow definition), who are currently treated with lipid lowering therapy",
                                      "Percentage of patients aged 18 and over, with no GP recorded CVD and a GP recorded QRISK score of 10% or more, CKD (G3a to G5), T1 diabetes (aged 40 and over) or T2 diabetes aged 60 and over, who are currently treated with lipid lowering therapy.",
                                      "Smoking cessation support offered",
                                      "Exception reporting for Smoking cessation support offered",

                                      "CHD register",
                                      "CT angiography",
                                      "Electrocardiography",
                                      "Aspirin, anti-platelet or anti-coagulent",
                                      "Exception reporting for Aspirin, anti-platelet or anti-coagulent",
                                      "Flu vaccination",
                                      "Exception reporting for Flu vaccination",
                                      "Percentage of patients aged 65 or over
who received a seasonal influenza vaccination between 1 September 2022 and 31 March 2023",
                                      " Percentage of patients aged 18 to 64
years and in a clinical at-risk group who received
a seasonal influenza vaccination between 1
September 2022 and 31 March 2023",
"Referral to cardiology (First outpatient)",
                                      "Cardiology outpatient DNAs",
"Elective PCI","Elective CABG","Waiting time for elective PCI / CABG","Elective PCI / CABG patients discharged before trimpoint",
"Cardiac rehabilitation - Started", "Cardiac rehabilitation - Completed","BP reading < 140/90","Readmission within 30 days of elective PCI / CABG","Emergency admissions for CHD","Deaths in hospital from CHD","Deaths in hospital from CHD <75","Deaths from CHD","Deaths from CHD <75"
                                      )
                        )

metrics_table |>
    gt()|>
  cols_hide(metric_domain)|>
 # tab_stubhead(label = md("**Domain**"))|>
    tab_header(
    title = md("**Table 1** - *Coronary heart disease pathway metrics*")
  )|>
   tab_row_group(
     id="Need",
    label = md("**Need**"),
    rows = 1
  ) |>
  tab_row_group(
    id="Risk factors",
    label = md("**Risk factors**"),
    rows = 2
  ) |>
  tab_row_group(
    id="Risk factor identification",
    label = md("**Risk factor identification**"),
    rows = 3:7
  )|>
  tab_row_group(
    id="Primary prevention",
    label = md("**Primary prevention**"),
    rows = 8:12
  )|>
  tab_row_group(
    id="Disease identification",
    label = md("**Disease identification**"),
    rows = 13:15
  )|>
  tab_row_group(
    id="Secondary prevention",
    label = md("**Secondary prevention**"),
    rows = 16:23
  )|>
    tab_row_group(
    id="Tertiary prevention",
    label = md("**Tertiary prevention**"),
    rows = 24:29
  )|>
    tab_row_group(
    id="Intermediate outcome",
    label = md("**Intermediate outcome**"),
    rows = 30:32
  )|>
    tab_row_group(
    id="Full outcomes",
    label = md("**Full outcomes**"),
    rows = 33:36
  )|>
  row_group_order(groups=c("Need","Risk factors","Risk factor identification","Primary prevention","Disease identification","Secondary prevention","Tertiary prevention","Intermediate outcome","Full outcomes"))|>
  cols_label(name_metric="")
    

```

# Clustering

In this analysis we used the k-medoids clustering method to group practices. Each practice was assigned to one of a small number of groups such that practices within the group have similar distributions of patients over ethnic groups. In order to obtain data on the ethnicity of a practice's registered population we imputed it based on the Lower Super Output Area (LSOA) of residence of GP registrants and the ethnicity distribution of patients living in those LSOAs using the Census 2021 data. Further detailed explanations of the method can be found in the appendix.

The clustering analysis resulted in 5 groups (clusters) of practices which are illustrated in the charts and map below.

## Cluster descriptions

The median percentage of each ethnic group for each cluster was calculated and is presented in these charts to describe their respective diversity.

### Cluster 1 - Least diverse

This cluster is the least diverse of the five and the median percentage of White patients is 97% (94% White British).

::: panel-tabset
## Five ethnic groups

```{r}
tar_read(cluster2_treemap_1)
```

## Fourteen ethnic groups

94% White British

```{r}
tar_read(cluster2_eth_chart_1)
```
:::

### Cluster 2

In this cluster the median percentage of White patients is 93% (87% White British). Those patients whose ethnicity is White are more likely to be White Irish or other White ethnicities (0.7% and 4% respectively) when compared to cluster 1. The median percentage of patients with a mixed ethnicity in this cluster is 2%.

::: panel-tabset
## Five ethnic groups

```{r}
tar_read(cluster2_treemap_2)
```

## Fourteen ethnic groups

87% White British

```{r}
tar_read(cluster2_eth_chart_2)
```
:::

### Cluster 3

In this cluster the median percentage of White patients is 78% (69% White British). The median percentage of patients with a mixed ethnicity in this cluster is 3.8%, which is higher than in cluster 2. Indian patients make up 3.3% of the patients, and 2.6% are patients whose ethnicity is Black African.

::: panel-tabset
## Five ethnic groups

```{r}
tar_read(cluster2_treemap_3)
```

## Fourteen ethnic groups

69% White British

```{r}
tar_read(cluster2_eth_chart_3)
```
:::

### Cluster 4

In cluster 4 just over half the patients are White with a median percentage of 55% (35% White British). The ethnic group of White has fewer White British than clusters 1 to 3, with 17% of patients having an ethnicity of White Other The median percentage of patients whose ethnicity is Black African in this cluster is 8.3%, which is higher than in cluster 3 and the median percentage of Black Caribbean patients is 3.9%. The median percentage of Black and Asian is 29% in this cluster.

::: panel-tabset
## Five ethnic groups

```{r}
tar_read(cluster2_treemap_4)
```

## Fourteen ethnic groups

35% White British

```{r}
tar_read(cluster2_eth_chart_4)
```
:::

### Cluster 5 - Most diverse

This cluster is the most diverse. Overall the median percentage of Asian patients in this cluster is 45%. There are 13.3% Indian, 9.7% Pakistani and 2.9% Bangladeshi. Mixed and other ethnic groups form a large proportion of the patients in this cluster.

::: panel-tabset
## Five ethnic groups

```{r}
tar_read(cluster2_treemap_5)
```

## Fourteen ethnic groups

22% White British

```{r}
tar_read(cluster2_eth_chart_5)
```
:::

## Age, Sex and Deprivation Profile of Clusters

Table 2 shows that the most diverse clusters are younger and the least diverse clusters are older. In particular almost one quarter of cluster 5 are under 18, compared to 19% of cluster 1 and 12% of cluster 1 are aged 75+ compared to only 4% of clusters 4 and 5.

```{r}
gp_reg_pat_prac_sing_age_female<- tar_read(gp_reg_pat_prac_sing_age_female)|>as_tibble()
gp_reg_pat_prac_sing_age_male<- tar_read(gp_reg_pat_prac_sing_age_male)|>as_tibble()
clusters_for_nacr <- tar_read(clusters_for_nacr)|>as_tibble()

females_gp_reg_pat <- gp_reg_pat_prac_sing_age_female |>
  filter(age != "ALL") |>
  mutate(age=as.numeric(case_when(age=="95+" ~ 95,
                                  .default = as.numeric(age))))|>
  mutate(age_group=case_when(age<18~'<18',
                             age>=18 & age <45 ~'18-44',
                             age>=45 & age <65 ~'45-64',
                             age>=65 & age <75 ~'65-74',
                             age>=75 ~'75+'))|>
  select(-age)|>
  mutate(sex=2)

males_gp_reg_pat <- gp_reg_pat_prac_sing_age_male |>
  filter(age != "ALL") |>
  mutate(age=as.numeric(case_when(age=="95+" ~ 95,
                                  .default = as.numeric(age))))|>
  mutate(age_group=case_when(age<18~'<18',
                             age>=18 & age <45 ~'18-44',
                             age>=45 & age <65 ~'45-64',
                             age>=65 & age <75 ~'65-74',
                             age>=75 ~'75+'))|>
  select(-age)|>
  mutate(sex=1)

all_gp_reg_pat <- males_gp_reg_pat|>
  rbind(females_gp_reg_pat)|>
  rename(gp_practice_code=org_code)|>
  group_by(sex,age_group,gp_practice_code)|>
  summarise(number_of_patients=sum(number_of_patients))|>
  ungroup()|>
  left_join(clusters_for_nacr|>select(gp_practice_code,cluster))|>
  group_by(sex,age_group,cluster)|>
  summarise(number_of_patients=sum(number_of_patients))|>
  ungroup()|>
  mutate(sex_name=case_when(sex==1~'Male',
                            sex==2~'Female'))|>
  select(-sex)|>
  group_by(cluster)|>
  mutate(patients_total_cluster=sum(number_of_patients))|>
  mutate(perc=number_of_patients/patients_total_cluster)|>
  select(-number_of_patients,-patients_total_cluster)

all_gp_reg_pat |>
    pivot_wider(names_from=c(cluster,sex_name),
                names_sep = "_",
                values_from = perc
                )|>
    select(age_group, starts_with("1"), starts_with("2"), starts_with("3"),starts_with("4"),starts_with("5")) |>
  arrange(factor(age_group, 
                 levels = c("<18","18-44","45-64","65-74","75+")))|>
    gt(rowname_col = "age_group")|>
    tab_spanner_delim(delim = "_") |>
    tab_spanner(label = "Cluster", columns = c(starts_with("1"), starts_with("2"), starts_with("3"),starts_with("4"),starts_with("5")))|>
    tab_stubhead(label=md("**Age Group**"))|>
    tab_options(heading.title.font.size = 18,
                heading.title.font.weight = "bolder",
                column_labels.font.weight = "bold")|>
    fmt_percent(decimals = 2)|>
    tab_header(title = md("**Table 2** - *Age / Sex Breakdown by Cluster*"))


```

Table 3 shows that the most diverse cluster (cluster 5) has the greatest proportion of patients in practices in the most deprived quantile. Cluster 2 has the greatest proportion of patients in practices in the least deprived quantile.

```{r}
source("R/get_imd_data_by_gpprac.R")

patientweighted_practice_imd |>
  mutate(gp_imd_quantile = case_when(gp_imd_decile<=2 ~ "1 - most deprived",
                                     gp_imd_decile==3 ~"2",
                                     gp_imd_decile==4 ~"2",
                                     gp_imd_decile==5 ~"3",
                                     gp_imd_decile==6 ~"3",
                                     gp_imd_decile==7 ~"4",
                                     gp_imd_decile==8 ~"4",
                                     gp_imd_decile>=9 ~"5 - least deprived",
                                     .default = "0"
  ))|>
  group_by(gp_imd_quantile,cluster)|>
  summarise(patients=sum(patients))|>
  ungroup()|>
  group_by(cluster)|>
  mutate(cluster_total=sum(patients),
         perc=patients/cluster_total)|>
  select(-patients,-cluster_total)|>
  pivot_wider(names_from=c(cluster),
              names_sep = "_",
              values_from = perc
  )|>
  select(gp_imd_quantile, starts_with("1"), starts_with("2"), starts_with("3"),starts_with("4"),starts_with("5")) |>
  arrange(factor(gp_imd_quantile, 
                 levels = c("1 - most deprived","2","3","4","5 - least deprived")))|>
  
  gt(rowname_col = "gp_imd_quantile")|>
  tab_stubhead(label=md("**IMD Quantile**"))|>
  tab_options(heading.title.font.size = 18,
              heading.title.font.weight = "bolder",
              column_labels.font.weight = "bold")|>
    tab_stubhead(label=md("**IMD Quantile**"))|>
  tab_spanner(label = "Cluster", columns = c(starts_with("1"), starts_with("2"), starts_with("3"),starts_with("4"),starts_with("5")))|>
  fmt_percent(columns=c(starts_with("1"), starts_with("2"), starts_with("3"),starts_with("4"),starts_with("5")),decimals = 2)|>
  tab_header(title = md("**Table 3** - *Index of Multiple Deprivation Breakdown by Cluster*"))
```

## Clustered GP Practices

The interactive map below presents each GP practice coloured according to their assigned cluster. Cluster 1 (red) is the least diverse and cluster 5 (orange) is the most diverse.

```{r}
tar_read(cluster2_map)
```

# Rate of activity

The charts below show the activity rates by cluster for each metric along the CHD pathway. The horizontal line is the overall rate. The 95% confidence intervals are shown at the top of each bar. These activity rates take account of need by using CHD prevalence (or list size for risk factor) as the denominator of the calculation. Further detail on the methodology can be found in the appendix.

In some cases these upper and lower limits are so small as to not be visible (e.g. Obesity register), whereas for other metrics (e.g. Readmission within 30 days of a PCI or CABG) they are much wider. Metrics with larger volumes of activity data give rise to a greater confidence in the calculated rate (e.g. limits are closer together) when compared to metrics with smaller volumes of activity data where there is less confidence in the calculated rate (e.g. limits are further apart).

In calculating these rates for each cluster within each metric and then comparing the rates by cluster to the global rate for the metric it is possible to identify how activity rates vary across the 5 clusters. For there to be equity between the clusters the rates would need to be the same. The charts below present the metrics grouped according to pathway domain.

### Risk factors

::: callout-tip
## Key Findings

-   Activity rates vary very little for the risk factor metric % patients 45+ with a record of BP \< 5 years

-   Rates of patients on the diabetes register are highest in the most diverse cluster

-   Rates of patients on the depression register are highest in the least diverse cluster and lowest in the most diverse cluster
:::

#### Risk factors

```{r}
tar_read(rate_chart_risk_fact)
```

#### Risk factor identification

```{r}
tar_read(rate_chart_risk_fact_ident)
```

### Primary prevention

::: callout-tip
## Key Findings

-   The activity rate for CVD patients treated with LLT is highest for those patients whose practice is in the least diverse cluster and lowest for patients whose practice is in cluster 4.

-   A similar pattern applies for at risk patients treated with LLT, activity rates being highest for those in the least diverse cluster.
:::

```{r}
tar_read(rate_chart_prim_prevent)

```

### Disease identification

::: callout-tip
## Key Findings

-   Activity rates vary considerably across the clusters in the disease identification metrics, with differing patterns for each metric

-   Patients from clusters 4 and 5 (most diverse) are less likely to be recorded on the CHD register

-   Clusters 5 and 4 have lower rates of CT angiography and electrocardiography
:::

```{r}
tar_read(rate_chart_disease_ident)
```

### Secondary prevention

::: callout-tip
## Key Findings

-   There is little variation between clusters 1 to 4 in referral rates to outpatient cardiology, although the referrals are lower for the most diverse cluster (cluster 5)

-   Flu vaccination rates for 65+ patients are highest among patients whose practice is in the least diverse cluster. the two most diverse clusters having the lowest rates

-   Amongst patients under 65 who are at risk, flu vaccination rates are less varied between the clusters than for 65+ patients. Cluster 4 has the lowest rate of vaccination

-   Cardiology outpatient DNA rates are higher the more diverse the cluster
:::

```{r}
tar_read(rate_chart_second_prevent)
```

### Tertiary prevention

::: callout-tip
## Key Findings

-   All the tertiary prevention metrics, except waiting time for elective PCI/CABG, follow broadly the same pattern of activity rates with the highest rates being in the least diverse cluster (cluster 1), followed by cluster 2 and then clusters 5 (the most diverse) and 3. Cluster 4 has the lowest activity rates.
-   The rate for waiting time for elective PCI/CABG is greatest in the most diverse cluster (cluster 5) and lowest in cluster 4, although overall there is little variation between the clusters.
:::

```{r}
tar_read(rate_chart_tert_prevent)
```

### Intermediate outcome

::: callout-tip
## Key Findings

-   The least diverse cluster has the highest rates of BP readings \< 140/90.

-   Emergency admissions for CHD and readmissions within 30 days of a PCI or CABG are highest for the least diverse cluster and lowest for cluster 4.
:::

```{r}
tar_read(rate_chart_int_out)
```

### Full outcomes

::: callout-tip
## Key Findings

-   The least diverse cluster has the highest rate of CHD deaths and premature CHD deaths. Cluster 4 has the lowest rate of CHD deaths.

-   CHD hospital deaths and premature CHD hospital deaths also have the lowest rate in cluster 4. Premature CHD hospital deaths have a similar rate for the other clusters including the most diverse cluster.
:::

```{r}
tar_read(rate_chart_full_out)
```

::: {.callout-note appearance="minimal"}
Work led by Professor Sir Michael Marmot and colleagues in the late 1980s revealed that first-generation South Asians living in the UK have a higher rate of coronary heart disease and diabetes compared to White Europeans[^5]. This gives rise to the question, why are the death rates lower in the more diverse clusters? There are a number of factors to consider in understanding this including; the age/sex profile of the clusters, the methodology for calculating the CHD synthetic prevalence (the measure of need in this project), and age and sex standardised mortality rates (ASMR) compared to the project CHD mortality rates. These are examined further in the appendix.
:::

[^5]: <https://www.bhf.org.uk/what-we-do/our-research/research-successes/ethnicity-and-heart-disease>

::: {.callout-tip collapse="true" appearance="minimal" icon="false"}
## Activity rates data

```{r}
rate_chart_data <- tar_read(rate_chart_data) |> as_tibble()

rate_chart_data |> 
  dplyr::rename(cluster=cluster2)|>
  dplyr::select(-name) |>
  gt::gt() |>
  gt::cols_move(columns = c(rate,lower_ci,upper_ci, global_rate),
                after = metric_name) |>
  gt::fmt_number(columns=c(rate,lower_ci,upper_ci, global_rate),
                 decimals=3) |>
  gt::cols_label(cluster =gt::md("**Cluster**"),
                 pathway = gt::md("**Pathway**"),
                 metric_name = gt::md("**Metric**"),
                 rate= gt::md("**Rate**"),
                 lower_ci = gt::md("**Lower CI**"),
                 upper_ci = gt::md("**Upper CI**"),
                 global_rate = gt::md("**Global Rate**")) |>
  gt::tab_row_group(
    label = gt::md("**Full outcomes**"),
    rows = pathway == "Full outcomes"
  ) |>
  gt::tab_row_group(
    label = gt::md("**Intermediate outcome**"),
    rows = pathway == "Intermediate outcome"
  ) |>
  gt::tab_row_group(
    label = gt::md("**Tertiary prevention**"),
    rows = pathway == "Tertiary prevention"
  ) |>
  gt::tab_row_group(
    label = gt::md("**Secondary prevention**"),
    rows = pathway == "Secondary prevention"
  ) |>
  gt::tab_row_group(
    label = gt::md("**Disease identification**"),
    rows = pathway == "Disease identification"
  ) |>
  gt::tab_row_group(
    label = gt::md("**Primary prevention**"),
    rows = pathway == "Primary prevention"
  ) |>
  gt::tab_row_group(
    label = gt::md("**Risk factor identification**"),
    rows = pathway == "Risk factor identification"
  ) |>
  gt::tab_row_group(
    label = gt::md("**Risk factors**"),
    rows = pathway == "Risk factors"
  ) 
```
:::

```{r}
rate_chart_data %>%
  download_this(
    output_name = "rate_data",
    output_extension = ".xlsx",
    button_label = "Download activity rate data",
    button_type = "default",
    has_icon = TRUE,
    icon = "fa fa-save"
  )
```

# Relative Index of Disparity

The relative index of disparity for the metric is expressed here as a %, such that the % indicates the amount of activity that would need to be redistributed between clusters in order that event rates follow levels of need. The detail of the methodology used for its calculation can be found in the appendix.

The chart below presents the metrics along the CHD pathway from risk factor and identification through to outcome measures of death from CHD. For each metric the index of disparity is shown as a point estimate (yellow dot) with the upper and lower confidence intervals in grey.

::: callout-tip
## Key Findings

-   The greatest disparity along the CHD pathway is flu vaccinations for patients aged 65+ where the relative index of disparity is 17.14% (95% ci - 17.11%, 17.17%), indicating that 17.1% of activity needs to be redistributed so that rates reflect need.

-   The most equitably distributed metric is blood pressure checks within the last 5 years for patients aged 45+ with a relative index of disparity of 0.8% (95% ci - 0.82%, 0.78%).

-   There are 7 metrics where the index of disparity is less than 5%, as well as blood pressure checks, these include metrics such as referrals to outpatient cardiology, outpatient DNAs and waiting times for elective procedures.

-   Out of the 33 metrics along the CHD pathway, 13 have an index of disparity of greater than 10%.

-   In general (but not exclusively) these greater levels of disparity occur in the disease identification and secondary and tertiary prevention pathway domains and consequently the outcome metrics related to deaths from CHD.
:::

```{r}
#| out-height: 600px

tar_read(ci_iod_chart)
```

::: {.callout-tip icon="false"}
## Confidence Intervals

Confidence intervals are shown in grey. There is 95% confidence that the index of disparity is within this range. The confidence interval estimation methods are detailed in the appendix.
:::

::: {.callout-tip collapse="true" appearance="minimal" icon="false"}
## Index of Disparity data

```{r}
rate_chart_data <- tar_read(rate_chart_data) |> as_tibble()
iod_with_ci <- tar_read(iod_with_ci) |> as_tibble()

metric_names <- rate_chart_data |> 
  select(pathway,metric_name,name) |> 
  unique() |> 
  rename(metric_name_full=metric_name, metric_name=name)

disparity_data <- iod_with_ci |> left_join(metric_names)

disparity_data |> 
  select(-metric_name) |>
  gt() |>
  cols_move(columns = c(iod,lower_ci,upper_ci),
            after = metric_name_full) |>
  tab_row_group(
    label = md("**Full outcomes**"),
    rows = pathway == "Full outcomes"
  ) |>
  tab_row_group(
    label = md("**Intermediate outcome**"),
    rows = pathway == "Intermediate outcome"
  ) |>
  tab_row_group(
    label = md("**Tertiary prevention**"),
    rows = pathway == "Tertiary prevention"
  ) |>
  tab_row_group(
    label = md("**Secondary prevention**"),
    rows = pathway == "Secondary prevention"
  ) |>
  tab_row_group(
    label = md("**Disease identification**"),
    rows = pathway == "Disease identification"
  ) |>
  tab_row_group(
    label = md("**Primary prevention**"),
    rows = pathway == "Primary prevention"
  ) |>
  tab_row_group(
    label = md("**Risk factor identification**"),
    rows = pathway == "Risk factor identification"
  ) |>
  tab_row_group(
    label = md("**Risk factors**"),
    rows = pathway == "Risk factors"
  ) |>

  fmt_number(columns=c(iod,lower_ci,upper_ci),
             decimals=3) |>
  cols_label(pathway = md("**Pathway**"),
             metric_name_full = md("**Metric**"),
             iod= md("**IOD**"),
             lower_ci = md("**Lower CI**"),
             upper_ci = md("**Upper CI**"))
```
:::

```{r}
disparity_data %>%
  download_this(
    output_name = "iod_data",
    output_extension = ".xlsx",
    button_label = "Download iod data",
    button_type = "default",
    has_icon = TRUE,
    icon = "fa fa-save"
  )
```

## Routes to equity

A previous Strategy Unit report "Strategies to reduce inequalities in access to planned hospital procedures"[^6] highlights in chapter 1, three of the many routes from inequity to equity: levelling-up, levelling-down and zero-sum redistribution. Using the zero-sum redistribution route as an example, along with the metric flu vaccinations for patients aged 65+, the illustration below shows how equity could be achieved by increasing and decreasing the number of vaccinations delivered in each cluster to give the same activity rate per cluster.

[^6]: <https://www.midlandsdecisionsupport.nhs.uk/knowledge-library/strategies-to-reduce-inequalities-in-access-to-planned-hospital-procedures/>

*Table 4 - achieving equity via zero-sum redistribution*

|  Cluster  | Number of vaccinations | CHD synthetic prevalence | Rate (No. of vaccinations / CHD synthetic prevalence) | Global rate | Change in number of vaccinations to match global rate |
|:---------:|----------:|----------:|:------------:|:---------:|:------------:|
|           |                    *n* |                      *d* |                        *r=n/d*                        | *gr=𝚺n/𝚺d*  |                     *(gr-r) x d*                      |
|     1     |              3,356,002 |                1,228,453 |                         2.73                          |    1.92     |                      -1,001,284                       |
|     2     |              2,638,698 |                1,124,040 |                         2.35                          |    1.92     |                       -484,121                        |
|     3     |              1,862,430 |                1,181,796 |                         1.58                          |    1.92     |                        402,855                        |
|     4     |                476,616 |                  571,049 |                         0.83                          |    1.92     |                        617,979                        |
|     5     |                332,728 |                  415,950 |                         0.80                          |    1.92     |                        464,571                        |
| **Total** |          **8,666,474** |            **4,521,288** |                         1.92                          |             |                         **0**                         |

In reality, in attempting to achieve equity for this metric, it is unlikely that the offer of vaccination would be removed from patients whose practice is in clusters 1 and 2. More likely would be an approach to focus public health and vaccination roll-out campaigns on practices in clusters 3, 4 and 5.

Each of the metrics along the CHD pathway with the higher relative indicies of disparity would almost certainly require a variety of different routes from inequity to equity given their differing nature. The more effective interventions on the pathway might be levelled-up, whilst those of more limited value might be levelled-down.

# Regional Analysis

## Clusters

The charts below show the distribution of the patients in each NHS region according to the cluster of their GP practice.

::: panel-tabset
## London

```{r}
#1
region_cluster_charts <- tar_read(region_cluster_charts)
region_cluster_charts[[2]]
```

## South West

```{r}
#2
region_cluster_charts[[7]]
```

## South East

```{r}
#3
region_cluster_charts[[6]]
```

## Midlands

```{r}
#4
region_cluster_charts[[3]]
```

## East of England

```{r}
#5
region_cluster_charts[[1]]
```

## North West

```{r}
#6
region_cluster_charts[[5]]
```

## North East and Yorkshire

```{r}
#7
region_cluster_charts[[4]]
```
:::

## Relative Index of Disparity

::: callout-tip
## Key Findings

-   Looking at a regional level, the greatest disparity is in CT angiography in the South West, Midlands, and the North West.
:::

::: panel-tabset
## London

```{r}
#| out-height: 600px
regional_charts <- tar_read(regional_charts)
regional_charts[1]
```

## South West

```{r}
#| out-height: 600px
regional_charts[2]
```

## South East

```{r}
#| out-height: 600px
regional_charts[3]
```

## Midlands

```{r}
#| out-height: 600px
regional_charts[4]
```

## East of England

```{r}
#| out-height: 600px
regional_charts[5]
```

## North West

```{r}
#| out-height: 600px
regional_charts[6]
```

## North East and Yorkshire

```{r}
#| out-height: 600px
regional_charts[7]
```
:::

The appendix contains equivalent charts to those above, showing the Index of Disparity for each CHD pathway metric for each of the 42 ICBs.

# Conclusions

In conclusion, there are three main aspects to this analysis on which to reflect: methods, findings and next steps.

Assessing inequalities across ethnic groups was challenging on a number of levels, it was both experimental and complex. Ethnicity can’t be expressed as a set of ordered groups and there is no meaningful way to numerically aggregate into a single variable. Consequently, relative index of inequality can’t be used.

This analysis attempted to overcome these challenges using novel methods, creating 5 clusters of GP practices using K-Medoids clustering on ethnicity %, followed by use of the relative index of disparity to indicate the extent to which the rate of an activity or event on the CHD pathway varied across the clusters.

Further challenges arose during the analysis, which are detailed in the appendix, however this analysis has made progress in the methods used and they are useful in presenting a visualisation of the variation in disparity of equity across the CHD pathway.

The results of the analysis should be viewed as tentative rather than actionable, adding to a fuller picture alongside other work and thinking on inequalities and CHD.

Early findings show that variation clearly exists across the pathway with some metrics showing less than 1% disparity (blood pressure checks within 5 years for 45+), whilst others have as much as 17% disparity (flu vaccinations for 65+).

The analysis found that risk factors varied little across clusters, although diabetes rates were identified more in the most diverse cluster and depression was identified more in the least diverse cluster. Moving through the pathway, CVD patients or at risk patients had higher rates of lipid lowering therapies in the least diverse clusters. Rates of diagnosis (on the CHD register, CT angiography, and electrocardiography) were all lower in the more diverse clusters.

Moving across the pathway towards treatment and care, inequity showed up more between clusters in some of the secondary and tertiary prevention metrics in particular CHD patients receiving Asprin, APT or ACT (12.5%), starting cardiac rehabilitation(11%) and Elective CABG (10%).

Mortality (CHD deaths and premature CHD deaths, including in hospital) was higher in the least diverse cluster, which may reflect age and sex profiles of the population (and which is examined and discussed in further detail in the appendix).

A lot of risk factors had similar rates to those found when examining the pathway through a socio-economic lens but a lot of diagnostics such as being on the CHD register or getting a CT Angiography were lower in the more diverse clusters. There are also parallels with the previous work examining socio-economic inequalities along the pathway for secondary and tertiary prevention metrics with many of these metrics such as Elective PCI and CABG and receipt of Asprin etc also showing that inequity favoured the least deprived. 

Many of these findings give rise to further questions and therefore there are a number of further routes for analysis and possible next steps that could be taken to extend and follow up this project.

Whilst this project includes some regional and ICB level presentations, in some cases the majority of GP practices in a region or ICB have the same cluster. The analysis could be recreated at a regional level, whereby new unique clusters could be assigned separately for a region, potentially giving rise to more nuanced clusters from which local disparities could be more easily identified.

The methods used in this project could be transferred to other pathways and services, re-using the same clusters and then selecting different sets of metrics for example related to COPD, Cancer, Mental Health etc.

The code for this project is available on GitHub and as such this enables the work to be reused and extended more easily.

Many of the metrics used in this analysis were only available at a GP practice level, necessitating some of the approaches taken. Where record level data is available this would allow for more detailed exploration and understanding of some of these initial findings. Of particular interest would be further exploration of the findings surrounding mortality rates and the inter-relationships between different factors of inequality such as ethnicity and deprivation, as well as better understanding at an individual ethnicity level.

# Appendix

## Definitions and data sources of pathway metrics

The table below sets out the sources of the various metrics as well as the time-period to which they relate, and details of the selection criteria used.

```{r}
library(gt)
metrics_table <- tibble(metric_domain = c("Need","Risk factors","Risk factor identification","Risk factor identification","Risk factor identification","Risk factor identification","Risk factor identification","Risk factor identification","Primary prevention","Primary prevention","Primary prevention","Primary prevention","Disease identification","Disease identification","Disease identification","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Intermediate outcome","Intermediate outcome","Intermediate outcome","Full outcomes","Full outcomes","Full outcomes","Full outcomes"),
  name_metric=c("CHD synthetic prevalence estimates",
                                      "Smoking synthetic prevalence estimates",
                                      "Smoking register",
                                      "Obesity register",
                                      "Diabetes register",
                                      "Depression register",
                                      "The percentage of patients aged 45 or over who have a record of blood pressure in the preceding 5 years",
                                      "CVD risk register",
                                      "Percentage of patients aged 18 and over with GP recorded CVD (narrow definition), who are currently treated with lipid lowering therapy",
                                      "Percentage of patients aged 18 and over, with no GP recorded CVD and a GP recorded QRISK score of 10% or more, CKD (G3a to G5), T1 diabetes (aged 40 and over) or T2 diabetes aged 60 and over, who are currently treated with lipid lowering therapy",
                                      "Smoking cessation support offered",
                                      "Exception reporting for Smoking cessation support offered",
                                      "CHD register",
                                      "CT angiography",
                                      "Electrocardiography",
                                      "Aspirin, anti-platelet or anti-coagulent",
                                      "Exception reporting for Aspirin, anti-platelet or anti-coagulent",
                                      "Percentage of patients aged 65 or over
who received a seasonal influenza vaccination
between 1 September 2022 and 31 March 2023",
                                      " Percentage of patients aged 18 to 64
years and in a clinical at-risk group who received
a seasonal influenza vaccination between 1
September 2022 and 31 March 2023",
"Referral to cardiology (First outpatient)",
                                      "Cardiology outpatient DNAs",
"Elective PCI","Elective CABG","Waiting time for elective PCI / CABG","Elective PCI / CABG patients discharged before trimpoint",
"Cardiac rehabilitation - Started", "Cardiac rehabilitation - Completed","BP reading < 140/90","Readmission within 30 days of elective PCI / CABG","Emergency admissions for CHD","Deaths in hospital from CHD","Deaths in hospital from CHD <75","Deaths from CHD","Deaths from CHD <75"
                                      ),
data_source=c("PHE","GP Survey","NHSD QOF","NHSD QOF","NHSD QOF","NHSD QOF","NHSD QOF","NHSD QOF","CVD Prevent","CVD Prevent",
              "NHSD QOF","NHSD QOF","NHSD QOF","SUS","SUS","NHSD QOF","NHSD QOF","IIF","IIF","SUS",
              "SUS","SUS","SUS","SUS","SUS","NACR","NACR","NHSD QOF","SUS","SUS",
              "SUS","SUS","ONS Death records","ONS Death records"),
year=c("2015","2023","2022/23","2022/23","2022/23","2022/23","2022/23","2019/20","2022/23","2022/23",
              "2022/23","2022/23","2022/23","2022/23","2022/23","2022/23","2022/23","March 2023","March 2023","2022/23",
              "2022/23","2022/23","2022/23","2022/23","2022/23","2021 & 2022","2021 & 2022","2022/23","2022/23 (Mar-Feb)","2022/23",
              "2022/23","2022/23","2022/23","2022/23"),
definition=c("Provided directly by OHID (formerly indicator 92847 in FingertipsR)","","Indicator 91280 extracted from FingertipsR","Indicator 93088 extracted from FingertipsR","Indicator 241 extracted from FingertipsR","Indicator 848 extracted from FingertipsR","Indicator 91262 extracted from FingertipsR","Formerly indicator 92589 extracted from FingertipsR","https://www.cvdprevent.nhs.uk/home","https://www.cvdprevent.nhs.uk/home",
              "Indicator 90619 extracted from FingertipsR","Indicator 90619 extracted from FingertipsR","Indicator 273 extracted from FingertipsR","OPCS Code: U102 - Cardiac computed tomography angiography. Elective only","OPCS: U19 & U34. Elective only","Indicator 90999 extracted from FingertipsR","Indicator 90999 extracted from FingertipsR","https://www.england.nhs.uk/primary-care/primary-care-networks/network-contract-des/iif/","https://www.england.nhs.uk/primary-care/primary-care-networks/network-contract-des/iif/","TFC = 320",
              "TFC = 320","OPCS Code: K49, K50, K75. FCE = 1","OPCS Code: K40-K46, FCE = 1","Elective admissions. OPCS Codes as per metrics 20 and 21","Main PCI and CABG Trim points for 2022/23 linked for HRG","Data supplied by ICB. Aggregated ACS and HF patients","Data supplied by ICB. Aggregated ACS and HF patients","CHD008","No. of emerg. spells up to 31/03/2023 within 0-29 days (inclusive) of the last, previous discharge from hospital / No. of finished spells with discharge date between 01/03/2022 and 28/02/2023 Exclude: TFC = 501, 560, 610, OPCS starting with O, Classpat = 1 (Ord.), Any diagnosis = C00*-C97*, D37*-D48*","PRIMARY diagnosis = I20 or I21 or I22 or I23 or I24 or I25",
              "Following any admission Elective or Emergency. PRIMARY diagnosis = I20 or I21 or I22 or I23 or I24 or I25","Following any admission Elective or Emergency. PRIMARY diagnosis = I20 or I21 or I22 or I23 or I24 or I25. Age <=75","Underlying Cause of Death = I20 or I21 or I22 or I23 or I24 or I25","Underlying Cause of Death = I20 or I21 or I22 or I23 or I24 or I25. Age <=75")
                        )

metrics_table |>
    gt()|>
    tab_header(
    title = md("**Table 5** - *Metric definitions and data sources*"))|>
    cols_label(name_metric=md("**Metric**"),
             metric_domain=md("**Domain**"),
             data_source=md("**Data Source**"),
             year=md("**Year**"),
             definition=md("**Definition and selection criteria/codes**"))
```

## Methods explained

### QOF data

Metrics taken from the QOF data via Fingertips have been extracted by GP practice and the counts (numerator of the measure) from the calculations have been used. Exception reporting metrics have used the Personalised Care Adjustment (PCA) data to obtain the number of patients by GP practice for whom a PCA has been recorded for the relevant metric. Possible reasons for a PCA include; Newly diagnosed/registered, Intervention is clinically unsuitable, patient choice, did not respond to offers of care, service not available.

### SUS data

SUS data was extracted using Transact-SQL and using relevant OPCS, ICD10, HRG, and Treatment function codes as detailed in table 5 above.

### Cardiac rehabilitation

This data was supplied by NACR from an extract taken in March 2024. The data wasn’t available by GP Practice, so an alternative method was followed. NACR was supplied with a list of practices, ICBs and the cluster to which they had been allocated by the Strategy Unit (SU). The data was then supplied to the SU aggregated into clusters and ICBs. The data relates to patients who started rehabilitation in 2021 or 2022 calendar years and were ACS or HF according to the NHS England reporting. These two years are the most recent complete years, and following changes in the way rehabilitation has been delivered due to the Covid pandemic, most closely reflect the rehabilitation model now in place. Patients needed to have GP Practice code recorded to be matched to a cluster. Those that didn't equated to 42,637 patients that had either NO GP CODE completed, no matching GP Code or GMC Number added, representing 48% of patient records. Within the data included in this analysis, there are 11 clusters across 9 ICBs that have no rehabilitation data despite there being practices in the cluster. Due to these high levels of missing data the two cardiac rehabilitation metrics have been included in the presentations of the index of disparity and activity rates at a national level, but excluded from the regional and ICB level presentations.

### Readmission within 30 days

The methodology used for this metric follows, as far as possible, that detailed by NHS Digital[^7]. The readmissions counted relate to those who had an elective PCI / CABG in the period 1/3/2022 to 28/2/2023.

[^7]: <https://digital.nhs.uk/data-and-information/publications/statistical/ccg-outcomes-indicator-set/specifications/3.2-emergency-readmissions-within-30-days-of-discharge-from-hospital_1_4>

### ONS Death records

ONS death records contain an encrypted HES ID which was used to link to outpatient, inpatient and A&E HES records, from which the latest GP practice recorded (on a spine traced record) for the patient was then assigned to the death record.

### CHD synthetic prevalence estimates

Public Health England (PHE) CHD synthetic prevalence estimates data was previously available from OHID via Fingertips. This data has now been removed from Fingertips and archived, however OHID supplied it via email for this project following a request. This data is for 7564 practices from 2015, of which 6297 are current practices. In order to assign a prevalence for practices with missing data a nearest neighbours methodology was used to impute the prevalence by taking an average of the prevalence of each practice within a 1.5km radius of the practice without data. Further details, including code with sample data, for this methodology are available in a blog post on the Strategy Unit Data Science website[^8].

[^8]: <https://the-strategy-unit.github.io/data_science/blogs/posts/2024-01-17_nearest_neighbour.html>

### Ethnicity by GP practice

Ethnicity by GP practice was imputed based on the Lower Super Output Area (LSOA) of residence of GP registrants and the ethnicity distribution of patients living in these LSOAs using the Census 2021 data.

#### GP list size by LSOA

Since metric data is mainly from the 2022/23 year, the October 2022 list size data was taken from the NHS Digital Patients Registered at a GP practice dataset[^9] as this is the data closest to the mid point in the 22/23 year. The GP to ICB mapping file was also downloaded and joined to the list size to easily create analysis by ICB and region.

[^9]: <https://digital.nhs.uk/data-and-information/publications/statistical/patients-registered-at-a-gp-practice/october-2022>

#### Census 2021 Ethnicity by LSOA

The 2021 Census Ethnicity data was downloaded from the .gov.uk website.[^10] Summary details of the census ethnicity data is also available and this includes descriptions of the standardised list of ethnic groups and the differences from the 2011 ethnic groups.[^11]

[^10]: <https://www.ethnicity-facts-figures.service.gov.uk/uk-population-by-ethnicity/demographics/age-groups/latest/#download-the-data>

[^11]: <https://www.ethnicity-facts-figures.service.gov.uk/uk-population-by-ethnicity/demographics/age-groups/latest/>

The 2021 Census Ethnicity data by LSOA uses the 2021 LSOAs. The GP list size data uses the 2011 LSOAs. This results in gaps when the datasets are combined. To overcome this the following method was used:

-   An LSOA lookup showing 2011 and 2021 LSOA parent and child relationships was joined to the 2021 Census Ethnicity data

-   Where a 2021 LSOAs has a parent LSOA in 2011 then we applied the average ethnicity % for the child LSOAs to the parent (this is when one 2011 LSOA has been split into multiple 2021 LSOAs).

-   Where the LSOAs in 2021 have a child LSOA in 2011 then we applied the ethnicity % for the 2021 parent to the 2011 children (this is when several LSOAs have been combined into one new LSOA).

Further details about item editing and imputation processes for Census 2021, England and Wales are available on the ONS website. This details techniques used by ONS with the aim of arriving at a fully populated clean database. These techniques include manual imputation and nearest neighbour donor imputation. For the England and Wales Census 2021 the imputation rate for ethnic group was 1.3%.[^12]

[^12]: <https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/methodologies/itemeditingandimputationprocessforcensus2021englandandwales>

### K-Medoids Clustering

K-medoids is an established, and commonly used unsupervised machine learning technique. This clustering approach required data on the ethnicity of a practices registered population which was obtained as per the methodology explained above (Ethnicity by GP practice). The data used in the clustering was the % of patients in each ethnic group per practice, such that for each practice all the ethnic group %'s added to 100%.

The K-medoids clustering was performed using the {cluster} package[^13] in R using partitioning (clustering) of the data into k clusters "around medoids" (PAM), which is a more robust version of K-means. The algorithm is based on the search for k representative objects or medoids among the observations of the dataset. After finding a set of k medoids, k clusters are constructed by assigning each observation to the nearest medoid. The goal is to find k representative objects which minimise the sum of the dissimilarities of the observations to their closest representative object.[^14]

[^13]: Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2023). *cluster: Cluster Analysis Basics and Extensions*. R package version 2.1.6 — For new features, see the 'NEWS' and the 'Changelog' file in the package source), [https://CRAN.R-project.org/package=cluster](https://cran.r-project.org/package=cluster).

[^14]: <https://cran.r-project.org/web/packages/cluster/cluster.pdf>

Clustering experiments were conducted using a variety of different combinations of ethnic breakdowns, such as 14 ethnic groups and 5 ethnic groups, as well as variations that excluded the % of White British patients or factored in age group. Techniques were also used to attempt to determine the optimal number of clusters such as the elbow method, gap plots, and scree plots. Consideration was given to combining some ethnic groups and in doing so a correlation matrix was produced to identify ethnic groups that commonly featured together. It was also important to seek a number of clusters that could be defined easily, where characteristics did not overlap each other so that resulting clusters were too similar to each other. It was also necessary to ensure that there was a sufficient number of practices in each cluster, too few practices in a cluster could result in issues with small amounts of metric data when calculating activity rates and consequently lead to wide confidence intervals in the index of disparity calculations.

In conclusion, the final version selected, and presented here, used the percentage of the GP list size in each of the 14 census 2021 ethnic groups and five clusters was felt to give a sufficient number of describable clusters for which the index of disparity calculations could be performed. The chart below is an elbow plot generated for the 14 ethnic groups showing an elbow at around 4 or 5 clusters.

```{r}
tar_read(elbow_plot)
```

Table 6 shows the number of GP practices assigned to each of the 5 clusters and the characteristics of the clusters are described in the main body of the report above.

```{r}
final_data_full_cats_percent_5_clusters <- tar_read(final_data_full_cats_percent_5_clusters) |> as_tibble()

final_data_full_cats_percent_5_clusters |>
  group_by(cluster)|>
  count("practice_code")|>
  ungroup() |>
  select(cluster,n)|>
  gt()|>
    tab_header(title = md("**Table 6** - *Number of practices per cluster*"))|>
    cols_label(cluster=md("**Cluster**"),
             n=md("**Number of practices**"))
```

### Index of Disparity calculations

#### Rate of activity

Calculations of the index of disparity begin by calculating an activity rate using a numerator of the activity ( eg number of elective procedures) divided by a denominator, which for metrics other than risk factors has been taken to be CHD synthetic prevalence (as a determinant of need).

Where the metric is in the risk factor domain the denominator used is GP practice list size (with the exception of the percentage of patients aged 45 or over who have a record of blood pressure in the preceding 5 years, where the 45+ list size has been used).

In calculating these rates for each cluster within each metric and then comparing these rates by cluster to the global rate for the metric it is possible to identify how activity rates vary across the 5 clusters

#### Index of disparity

The relative index of disparity indicates the extent to which the rate of an activity or event varies across groups. Having calculated the rates for each cluster within each metric the disparity is calculated by taking the differences for each cluster from the global rate and then determining the amount of activity by which this cluster varies from the global. This absolute difference is then summed for the 5 clusters and divided by twice the sum of the numerators. This produces a relative index of deprivation for the metric, expressed as a %, such that the % indicates the amount of activity that would need to be redistributed between clusters in order that event rates follow levels of need.

Index of disparity = (𝚺 \| r~(1–n)~ - R\| / n) / R\*100

where r = group rate and R = total population rate

A report by Pearcy and Keppel (2002) presents the Index of Disparity as a summary measure of disparity across population groups and advocates its use for groups defined in terms of categories such as ethnicity.[^15]

[^15]: <https://journals.sagepub.com/doi/epdf/10.1093/phr/117.3.273>

### Confidence Intervals

Confidence intervals of 95% are presented in the national charts presented in this report. Confidence intervals have been used, rather than calculations of statistical significance as this presents a clearer picture of the likelihood of the disparity and due to concerns with the suitability in this case of statistical significance.

> "Practitioners need to be aware that statistical significance differs from practical importance in that statistical significance is highly dependent upon sample size. For a large sample, a statistically significant result is likely to be obtained even when the actual magnitude of an effect is small and of little or no practical importance. On the other hand, for a small sample, it is quite likely that insufficient evidence of a statistically significant result will be obtained – even when there is, indeed, an effect of practical importance." (Hahn, G. J., Doganaksoy, N., Meeker, W. Q., 2019)[^16]

[^16]: <https://academic.oup.com/jrssig/article/16/4/20/7038025>

In order to estimate the confidence intervals for the index of disparity, a bootstrap method was adopted, assuming independence, generating 1000 random samples for each metric for which an index of disparity was calculated. The 2.5th and 97.5th percentiles were then taken for each metric to give the upper and lower limits of the 95% confidence interval for each.

An explanation of bootstrap theory and confidence intervals is available in Practical Statistics for Data Scientists (Bruce, P. and Bruce, A., 2017)[^17]

[^17]: <https://www.oreilly.com/library/view/practical-statistics-for/9781491952955/>

## CHD mortality rates

In examining the findings of this project the question arises, why are the mortality rates lower in the more diverse clusters? There are a number of factors to consider in understanding this:

::: {.callout-note appearance="minimal" icon="false"}
##### Age/Sex profile of the clusters

The age/sex profile of the clusters can be seen in table 2. This shows that the most diverse clusters have a much younger profile, 68% are under 45 in the most diverse cluster, compared to 48% of the least diverse cluster. Those aged 75+ represent only 4% of the most diverse cluster, compared to 12% of the least diverse cluster.
:::

::: {.callout-note appearance="minimal" icon="false"}
##### Methodology for calculating CHD synthetic prevalence

The measure of need used in this project to calculate the activity rates is the PHE synthetic prevalence estimate of CHD. The prevalence calculations take ethnicity and age and sex into account in their methodology. Other factors include: hypertension, diabetes, dyslipidemia, obesity, smoking, CKD, physical activity, family history, and area deprivation (townsend score). Ethnicity is a covariate in the prevalence model (although the variable only has two levels, white / non-white).  It is estimated that a non-white individual is 33% more likely to develop CHD than a white person with similar risk factors (age, sex, family history, etc etc). The prevalence estimates account for the fact that a non-white patient would have a higher likelihood of developing CHD which feeds into the denominator of the project calculations and the larger denominator leads to a lower rate.

Examining clusters 1 and 5 in more detail, the % CHD prevalence (need) is much higher in cluster 5 (most diverse) and this gives rise to activity rates which are lower even when the % activity volume is the same as cluster 1. This is illustrated with example data in table 7 below.
:::

| Cluster       | CHD Prevalence (%) |  List size | CHD Prevalence | CHD Deaths (% of list size) | Activity rate (CHD Deaths/Prevalence) |
|------------|:----------:|-----------:|-----------:|:----------:|:-------------:|
| Least diverse |        8.7%        | 14,000,000 |      1,218,000 |        6000 (0.043%)        |                0.00492                |
| Most diverse  |       10.6%        |  4,000,000 |        424,000 |        1714 (0.043%)        |                0.00404                |

: **Table 7**- *Example data to illustrate the effect of higher prevalence on activity rates*

::: {.callout-note appearance="minimal"}
##### Mortality rates

The Age and Sex Standardised Mortality Rates (direct) for each of the clusters for CHD Deaths and All Cause deaths have been calculated. These calculations use the same death data as was used in the numerator of the activity rate calculations. They are a measure of the rate death from CHD occurs in the population, if the populations have no difference in age and sex then the more diverse clusters are higher than would be expected. It is comparing to the total population whether those in that population have the condition or not.

These rates can be seen in table 8 below, alongside the CHD deaths crude rate from this project and an all cause deaths crude rate also calculated by dividing the number of deaths by the CHD prevalence and showing per 10,000 population. These crude rates are a measure of how many people already with CHD die.
:::

| Cluster | Population | ASMR - All cause | ASMR - CHD | Rate - All Cause (All deaths / Prevalence) | Rate - CHD (CHD deaths / Prevalence) |
|:---------:|----------:|----------:|----------:|---------------:|------------:|
|    1    | 16,914,882 |            913.5 |       93.9 |                                     1188.4 |                                123.1 |
|    2    | 15,788,339 |            894.3 |       87.6 |                                     1029.5 |                                100.5 |
|    3    | 16,416,864 |            935.4 |       93.3 |                                      813.7 |                                 80.6 |
|    4    |  7,883,280 |            773.7 |       82.2 |                                      431.4 |                                 45.9 |
|    5    |  4,961,545 |            868.2 |      105.8 |                                      511.1 |                                 62.7 |

: **Table 8** - *ASMR and Activity Rate*

Having controlled for prevalence, the more diverse clusters have lower death rates.  We might have expected that more diverse clusters would have similar or higher death rates. There are a number of factors potentially contributing to this finding. The CHD deaths crude rate is a case fatality rate, deaths in those that have CHD. The highest observed deaths per person with CHD is in the least diverse cluster. It is not age standardised so it could be argued that the rate is higher in this cluster because its prevalent population are older and in the more diverse clusters it is younger so fewer deaths would be expected.

In July 2021 ONS produced an experimental analysis of ethnic differences in life-expectancy and cause-specific mortality rates[^18]. The analysis found that White and Mixed ethnic groups had lower life expectancy than all other ethnic groups, while the Black African ethnic group had statistically significant higher life expectancy than most groups. However, the analysis also found that ischaemic heart disease mortality was highest in the Bangladeshi, Pakistani and Indian ethnic groups and lowest among Black ethnic groups. The analysis revealed complex patterns in life expectancy and mortality by ethnic group. Potential reasons are offered for the higher life expectancy found in Black African and Asian Other ethnic groups including that they contain a greater proportion of more recent migrants than other ethnic groups and it also cites previous research showing that people who migrate tend to be healthier than others. The ONS report also suggests that Asian and Black ethnic groups engage less in harmful health-related behaviours. These findings show some similarities to research in the USA into what has been termed the Hispanic paradox, where most Hispanic groups are characterised by low economic status and a high prevalence of cardiovascular disease risk factors, but better than expected health and mortality outcomes[^19]. Possible explanations for the Hispanic paradox include the healthy migrant hypothesis (those who migrate tend to be younger and healthier), the presence of nuclear families and high levels of social support, and dietary factors.

[^18]: <https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/lifeexpectancies/articles/ethnicdifferencesinlifeexpectancyandmortalityfromselectedcausesinenglandandwales/2011to2014>

[^19]: <https://www.sciencedirect.com/science/article/abs/pii/S0033062014001339?via%3Dihub>

In this project the more diverse clusters comprise of a mixture of different ethnic groups, in particular clusters 4 and 5 (most diverse). It is possible that a combination of factors described above are giving rise to the mortality by cluster and that the Black ethnic group in cluster 4, with the lowest mortality, is offsetting the higher mortality of the Asian ethnic group, alongside the higher CHD prevalence in the denominator and the younger age profile, as well as factors such as the healthy migrant hypothesis, and higher levels of social support.

## Integrated Care Board (ICB) Analysis

#### Relative Index of Disparity

The charts below are equivalent charts to those in the regional analysis section of the report, showing the Index of Disparity for each CHD pathway metric for each of the 42 ICBs. There is considerable variation in the levels of disparity across the ICBs as well as the metrics along the pathway.

```{r}
#| output: "asis"
icb_charts <- tar_read(icb_charts)

purrr::iwalk(icb_charts, \(plot, name) {
  name <- str_extract(name, "(?<=^NHS ).*(?= Integrated Care Board$)")
  cat("\n\n#####", name, "\n\n")
  print(plot)
})
```

```{r}

all_icb_data <- purrr::map(icb_charts, "data") |> dplyr::bind_rows(.id = "icb") 

all_icb_data %>%
  download_this(
    output_name = "iod_icb_data",
    output_extension = ".xlsx",
    button_label = "Download ICB IoD data",
    button_type = "default",
    has_icon = TRUE,
    icon = "fa fa-save"
  )
```

## R code on Github

All the code used in this project is available on the Strategy Unit GitHub[^20].

[^20]: <https://github.com/The-Strategy-Unit/bhf_chd-ethnicity>