slides.qmd

---
title: |
  <h2 style='font-size: 75px'>
  Data Analysis Workshop
  </h2>
  <div style='margin-bottom: 50px; font-size: 40px'>
  Linking health survey data with climate datasets in R
  </div>
author: 
  - |
    <p style='font-size: 25px; margin-left: 50px'>
      <img src='images/matt.png' width='100' style='vertical-align: middle'>
      Matt Gunther - Senior Data Analyst 
    </p>
  - |
    <p style='font-size: 25px; margin-left: 50px'>
      <img src='images/devon.png' width='100' style='vertical-align: middle'> 
      Devon Kristiansen - Project Manager
    </p>
    
format: 
  revealjs: 
    theme: [default, custom.scss]
    logo: images/logo-navy.png
    chalkboard: true
    smaller: true
    scrollable: false
    incremental: true
    preview-links: true
---

## Data and Research Question {.center}

::: {.columns}
:::: {.column width="50%"}
::::: {.fragment .fade-in-then-semi-out}
IPUMS PMA: Fertility outcomes for women in Burkina Faso 
:::::
::::: {.fragment .fade-in-then-semi-out}
Toy GPS data: coordinates for each PMA sample cluster
:::::
::::: {.fragment .fade-in-then-semi-out}
CHIRPS: Rainfall accumulation relative to local averages 
:::::
::::: {.fragment .fade-in}
<p style='color: #98579B'>
Do rainfall conditions influence women's short-term family planning decisions?
</p>
:::::
::::
:::: {.column width="50%"}
![](images/goals.png){.absolute height=400}
::::
::: 


## Setup {.nonincremental}

```{r}
knitr::opts_chunk$set(echo = TRUE)
```

Follow [these steps](https://github.com/matt-gunther/ined-pma-2022) to obtain today's code & publicly available data.

Open `analysis.rmd` and begin following along with these code chunks

::: {.columns}
:::: {.column width="48%"}
```{r}
# IPUMS & Spatial Data Tools
library(tidyverse)
library(ipumsr) 
library(sf)
library(terra)
library(ggspatial)
```
:::: 
:::: {.column width="4%"}
::::
:::: {.column width="48%"}
```{r}
# Analysis Tools 
library(srvyr)
library(survey)
library(lme4)
library(broom)
library(broom.mixed)
```
:::: 
:::


![](images/hex/tidyverse.png){.absolute left=0  height="200"}
![](images/hex/ipumsr.png){.absolute left=300  height="200"}
![](images/hex/sf.png){.absolute left=600  height="200"}
![](images/hex/terra.png){.absolute left=900  height="200"}


# 1 - IPUMS PMA Data 


## Downloading IPUMS PMA data {.center .nonincremental}

::: {.columns}
:::: {.column width="50%"}
![](images/pma/home.png){}
::::
:::: {.column width="50%"}
[Visit the IPUMS PMA data website](https://pma.ipums.org/pma/)

* Sample 
  * Burkina Faso
  * Longitudinal 
  * Female Respondents Only
* Variables 
  * [RESULTFQ](https://pma.ipums.org/pma-action/variables/RESULTFQ)
  * [PANELBIRTH](https://pma.ipums.org/pma-action/variables/PANELBIRTH)
  * [PANELWEIGHT](https://pma.ipums.org/pma-action/variables/PANELWEIGHT)
  * [EAID](https://pma.ipums.org/pma-action/variables/EAID)
  * [URBAN](https://pma.ipums.org/pma-action/variables/URBAN)
::::
:::


## Load IPUMS PMA data into R

You'll receive two files from IPUMS - put both in the `data` folder where `analysis.Rmd` is located.

::: {.fragment}
```{r, eval=FALSE}
pma <- read_ipums_micro(
  ddi = "data/pma_00136.xml", # Use your own file name here 
  data = "data/pma_00136.dat.gz" # Use your own file name here 
)
```

```{r, results='hide', echo = FALSE}
pma <- read_ipums_micro(
  ddi = "local/data/pma_00136.xml", # Use your own file name here 
  data = "local/data/pma_00136.dat.gz" # Use your own file name here 
)
```

:::
::: {.fragment}
```{r, echo = FALSE}
pma
```
:::


## Data format 

We've selected `wide` format longitudinal data, so Phase 1 and Phase 2 data are stored in separate columns. 

::: {.fragment}
```{r, eval=F}
pma %>% count(RESULTFQ_1, RESULTFQ_2)
```
:::
::: {.fragment}
```{r, echo=F}
pma %>% count(RESULTFQ_1, RESULTFQ_2)
```
:::
::: {.fragment}

Keep only women who completed the survey in *both* phases of the panel study.

```{r}
# Keep only panel members from both phases 
pma <- pma %>% filter(RESULTFQ_1 == 1 & RESULTFQ_2 == 1)
```
:::


# Questions about IPUMS PMA data? {.center}


# 2- PMA GPS data 


## About PMA GPS data {auto-animate="true"}

::: {.columns}
:::: {.column width="60%"}
::::: {.fragment .fade-in-then-semi-out}
PMA uses spatially referenced sample clusters - called "enumeration areas" (EAs) - sampled by probability proportional to population size.
:::::
::::: {.fragment .fade-in-then-semi-out}
At the beginning of the panel study, 35 households were randomly selected within each EA.
:::::
::::
:::: {.column width="40%"}
::::
::: 

## About PMA GPS data {auto-animate="true" visibility="uncounted"}

::: {.columns}
:::: {.column width="60%"}
::::: {style="opacity:0.5"}
PMA uses spatially referenced sample clusters - called "enumeration areas" (EAs) - sampled by probability proportional to population size.

At the beginning of the panel study, 35 households were randomly selected within each EA.
:::::
::::: {.fragment .fade-in-then-semi-out}
IPUMS PMA does not disseminate the GPS coordinates for EAs, but you may [apply here](https://www.pmadata.org/data/request-access-datasets) for access directly from PMA.
:::::
::::: {.fragment .fade-in-then-semi-out}
Today, we'll be using **falsified** GPS coordinates as an example. 
:::::
::::
:::: {.column width="40%"}
![](images/pma/gps1.png){.absolute top=100 right=50 height=300}
::::
::: 

## About PMA GPS data {auto-animate="true" visibility="uncounted"}

::: {.columns}
:::: {.column width="60%"}
::::: {style="opacity:0.5"}
PMA uses spatially referenced sample clusters - called "enumeration areas" (EAs) - sampled by probability proportional to population size.

At the beginning of the panel study, 35 households were randomly selected within each EA.

IPUMS PMA does not disseminate the GPS coordinates for EAs, but you may [apply here](https://www.pmadata.org/data/request-access-datasets) for access directly from PMA.
::::: 
Today, we'll be using **falsified** GPS coordinates as an example. 
::::

:::: {.column width="40%"}
![](images/pma/gps1.png){.absolute top=300 right=0 height=300}
::::
::: 

## About PMA GPS data {auto-animate="true" visibility="uncounted"}

::: {.columns}
:::: {.column width="60%"}
::::: {style="opacity:0.5"}
PMA uses spatially referenced sample clusters - called "enumeration areas" (EAs) - sampled by probability proportional to population size.

At the beginning of the panel study, 35 households were randomly selected within each EA.

IPUMS PMA does not disseminate the GPS coordinates for EAs, but you may [apply here](https://www.pmadata.org/data/request-access-datasets) for access directly from PMA.

Today, we'll be using **falsified** GPS coordinates as an example. 
:::::

The coordinates represent the <span style="color:red">**centroid**</span> of an enumeration area, *not* the location of any sampled household. 
::::
:::: {.column width="40%"}
![](images/pma/gps2.png){.absolute top=300 right=0 height=300}
::::
::: 


## Load PMA GPS data into R 

::: {.fragment}
A PMA GPS dataset is a simple CSV file with one row per EA, and columns containing latitude and longitude points.

```{r}
gps <- read_csv("data/pma_gps.csv")
```
::: 
::: {.fragment}
```{r, echo = FALSE}
gps 
```
:::
::: {.fragment}
The column `DATUM` describes the coordinate reference system: World Geodetic System 1984.
:::


## GPS data as a Simple Features Object

::: {.columns}
:::: {.column width="75%"}
::::: {.fragment}
The [sf](https://r-spatial.github.io/sf/index.html) ("simple features") package for R contains many of the same tools you would find in other GIS software
:::::
::::: {.fragment}

```{r, `code-line-numbers` = "1,2,5"}
gps <- gps %>% 
  st_as_sf(
    coords = c("GPSLONG", "GPSLAT"), 
    crs = 4326
  ) 
```
::::: 
::::
:::: {.column width="15%"}
![](images/hex/sf.png){.absolute right=0  top=90 height="200"}
::::
:::

## GPS data as a Simple Features Object {auto-animate="true" visibility="uncounted"}

::: {.columns}
:::: {.column width="75%"}
The [sf](https://r-spatial.github.io/sf/index.html) ("simple features") package for R contains many of the same tools you would find in other GIS software

```{r, `code-line-numbers` = "3-4"}
gps <- gps %>% 
  st_as_sf(
    coords = c("GPSLONG", "GPSLAT"), 
    crs = 4326
  ) 
```
::::: {.fragment}
`4326` is the [EPSG](https://epsg.io/4326) code for World Geodetic System 1984.
:::::
::::: {.fragment}
```{r, echo = FALSE}
gps 
```
:::::
::::
:::: {.column width="15%"}
![](images/hex/sf.png){.absolute right=0  top=90 height="200"}
::::
:::


## Adding a Shapefile

::: {.fragment}
We need a supplementary `shape` file to see where each point lies on a map of Burkina Faso.
:::
::: {.fragment}
You only need to specify the folder: R will locate the appropriate file.

```{r, results='hide'}
shape <- st_read("data/geobf") %>% select(ADMIN_NAME)
```
:::
::: {.fragment}
```{r, echo=FALSE}
shape
```
:::


## Mapping GPS coordinates {auto-animate=true}

::: {.fragment}
```{r, eval=FALSE}
ggplot() + 
  layer_spatial(gps) + 
  theme_minimal() 
```
:::
::: {.fragment}
```{r, echo=FALSE, fig.height=4}
ggplot() + 
  layer_spatial(gps) + 
  theme_minimal() 
```
:::

## Mapping GPS coordinates {auto-animate=true}

```{r, eval=FALSE, `code-line-numbers` = "3"}
ggplot() + 
  layer_spatial(gps) + 
  layer_spatial(shape, alpha = 0) + 
  theme_minimal() 
```

```{r, echo=FALSE, fig.height=4, fig.align='center'}
ggplot() + 
  layer_spatial(gps) + 
  layer_spatial(shape, alpha = 0) + 
  theme_minimal() 
```


## PMA displacement protocol

::: {.fragment}
In fact, these GPS coordinates are *not* the actual centroid of each EA.
:::
::: {.fragment}
:::: {style="text-align: center; margin-top: 1em"}
![](images/displacement.png){height="400"}
::::

In order to preserve confidentiality, PMA displaces the centroid location of each EA up to 2 km (urban areas) or 5 km (rural areas). 

Displacement does not cross admin 1 boundaries.
:::


---

::: {style="display: flex; align-items: center; height: 90%"}
:::: {.columns }
::::: {.column width="50%"}
<h2>Meters on a Round Plant</h2>
:::::: {.fragment}
Remember: the `geometry` of our `gps` data is defined by arc-degrees, not meters.
::::::
:::::: {.fragment}
<span style='color: #98579B'>If we tried to account displacement in meters without "flattening" our map, the length of one meter vary based on distance from the equator.</span>
::::::
:::::
::::: {.column width=50%}
:::::
::::
:::
![](images/projection.gif){.absolute top=0 right=50 height=600}


## Projection to Meters 

::: {.fragment}
We'll use [EPSG code 32630](https://epsg.io/32630) to focus our flat projection around Burkina Faso.
:::
::: {.fragment}
```{r}
gps <- gps %>% st_transform(crs = 32630)
shape <- shape %>% st_transform(crs = 32630)
```
:::
::: {.fragment}
```{r echo=FALSE}
gps
```
:::
::: {.fragment}
This changes the `geometry` column to meters `[m]`.
:::


## Creating Buffers 

::: {.fragment}
To keep things simple in our example, we'll give every EA a 5 km buffer.
::: 
::: {.fragment}
```{r, results='hide'}
gps_buffer <- gps %>% st_buffer(5000)
```
:::
::: {.fragment}
```{r echo=FALSE}
gps_buffer
```
:::
::: {.fragment}
Now, `geometry` contains several points on the circumference of a round polygon.
:::


## Intersecting Regional Boundaries

::: {style="display: flex; align-items: center; height: 90%"}
:::: {.columns}
::::: {.column width="40%"}
:::::: {.fragment}
In certain cases, a buffer may cross an admin 1 boundary in `shape`. 
::::::
:::::: {.fragment}
For example, we'll look at enumeration area `854131009`

```{r}
test_ea <- 854131009 
```
::::::
:::::
::::: {.column width="60%"}
:::::: {.fragment}
```{r, echo = FALSE, fig.height=4, fig.width=4, fig.align='center'}
ggplot() + 
  layer_spatial(gps_buffer %>% filter(EAID == test_ea), alpha = 0) + 
  layer_spatial(gps %>% filter(EAID == test_ea), color = "red") + 
  annotation_spatial(shape, alpha = 0) + 
  theme_minimal()
```
::::::
:::::
::::
:::


## Intersecting Regional Boundaries

::: {style="display: flex; align-items: center; height: 90%"}
:::: {.columns}
::::: {.column width="40%"}
In certain cases, a buffer may cross an admin 1 boundary in `shape`. 

For example, we'll look at enumeration area `854131009`

```{r}
test_ea <- 854131009 
```

Keep only the segment containing the original centroid.

```{r, eval = FALSE}
gps_buffer <- gps_buffer %>% 
  st_intersection(shape) %>% 
  st_filter(gps)
```

:::::
::::: {.column width="60%"}
```{r, echo = FALSE, fig.height=4, fig.width=4, fig.align='center'}
ggplot() + 
  layer_spatial(gps_buffer %>% filter(EAID == test_ea), alpha = 0) + 
  layer_spatial(gps %>% filter(EAID == test_ea), color = "red") + 
  annotation_spatial(shape, alpha = 0) + 
  theme_minimal()
```
:::::
::::
:::


## Intersecting Regional Boundaries

::: {style="display: flex; align-items: center; height: 90%"}
:::: {.columns}
::::: {.column width="40%"}
In certain cases, a buffer may cross an admin 1 boundary in `shape`. 

For example, we'll look at enumeration area `854131009`

```{r}
test_ea <- 854131009 
```

Keep only the segment containing the original centroid.

```{r, eval = TRUE}
gps_buffer <- gps_buffer %>% 
  st_intersection(shape) %>% 
  st_filter(gps)
```
:::::
::::: {.column width="60%"}
```{r, echo = FALSE, fig.height=4, fig.width=4, fig.align='center'}
ggplot() + 
  layer_spatial(gps_buffer %>% filter(EAID == test_ea), alpha = 0) + 
  layer_spatial(gps %>% filter(EAID == test_ea), color = "red") + 
  annotation_spatial(shape, alpha = 0) + 
  theme_minimal()
```
:::::
::::
:::


## Back to arc-degrees

:::: {.fragment}
Finally, we can return to our original CRS defined by arc-degrees. 
:::: 
:::: {.fragment}
```{r}
gps_buffer <- gps_buffer %>% st_transform(crs = 4326)
shape <- shape %>% st_transform(crs = 4326)
```
::::
:::: {.fragment}
```{r,  fig.height=4, fig.align='center'}

ggplot() + 
  layer_spatial(shape, alpha = 0) + 
  layer_spatial(gps_buffer, alpha = 0) + 
  theme_minimal()
```
:::: 


# Questions about PMA GPS Data?


# 3 - Mapping Birth Outcomes


## Birth outcomes for individuals

::: {.fragment .fade-in-then-semi-out}
Now, we combine `pma` together with `gps_buffer`.
:::
::: {.fragment .fade-in-then-semi-out}
The `pma` variable [PANELBIRTH_2](https://pma.ipums.org/pma-action/variables/PANELBIRTH) indicates whether each woman gave birth within the year that passed between Phase 1 and Phase 2 of the panel study.
:::

::: {.fragment .fade-in}
```{r}
pma %>% count(PANELBIRTH_2)
```
:::

::: {.fragment .fade-in-then-semi-out}
**Important:** Code `99` represents women who were "not in universe" because they had indicated elsewhere on the survey that they had never given birth. 
:::

::: {.fragment .fade-in-then-semi-out}
<p style='color: #98579B'>We can treat these cases as "No".</p>
:::

::: {.fragment .fade-in}
```{r}
pma %>% count(PANELBIRTH_2 == 1)
```
:::


## Birth outcomes by EA {auto-animate=true}

::: {.fragment .fade-in}
Births above and below the median enumeration area:
:::
::: {.fragment .fade-in}
```{r}
ea_summary <- pma %>% 
  as_survey_design(weight = PANELWEIGHT) %>% 
  group_by(ea = EAID_1, urban = URBAN == 1) %>% 
  summarise(birth_prop = survey_mean(PANELBIRTH_2 == 1, vartype = NULL)) %>% 
  ungroup()
```
:::
::: {.fragment .fade-in}
```{r, echo=FALSE}
ea_summary 
```
:::


## Birth outcomes by EA {auto-animate="true" visibility="uncounted"}

Births above and below the median enumeration area:

```{r, `code-line-numbers` = "6-8"}
ea_summary <- pma %>% 
  as_survey_design(weight = PANELWEIGHT) %>% 
  group_by(ea = EAID_1, urban = URBAN == 1) %>% 
  summarise(birth_prop = survey_mean(PANELBIRTH_2 == 1, vartype = NULL)) %>% 
  ungroup() %>% 
  mutate(
    ntile = ntile(birth_prop, 2)
  )
```

```{r, echo=FALSE}
ea_summary 
```

## Birth outcomes by EA {auto-animate="true" visibility="uncounted"}

Births above and below the median enumeration area:

```{r,  `code-line-numbers` = "8"}
ea_summary <- pma %>% 
  as_survey_design(weight = PANELWEIGHT) %>% 
  group_by(ea = EAID_1, urban = URBAN == 1) %>% 
  summarise(birth_prop = survey_mean(PANELBIRTH_2 == 1, vartype = NULL)) %>% 
  ungroup() %>% 
  mutate(
    ntile = ntile(birth_prop, 2),
    many_births = ntile == 2
  )
```

```{r, echo=FALSE}
ea_summary 
```


## Merging GPS data 

::: {.fragment}
Use `full_join` to merge all rows of `ea_summary` to `gps_buffer`. 

If you list `gps_buffer` *first*, the result will be another Simple Features object.

```{r}
ea_summary_gps <- full_join(
  gps_buffer %>% select(ea = EAID), 
  ea_summary, 
  by = "ea"
)
```
:::
::: {.fragment}
```{r, echo = FALSE}
ea_summary_gps <- ea_summary_gps %>% relocate(geometry, .after = everything())
ea_summary_gps
```
:::


## Mapping Birth Outcomes by EA

```{r, fig.height=4, fig.align='center'}
ggplot() + 
  layer_spatial(ea_summary_gps, aes(fill = many_births)) + 
  layer_spatial(shape, alpha = 0) + 
  theme_minimal()
```


## Questions about mapping?


# 4 - CHIRPS: Annual rainfall summary 


## Downloading CHIRPS data

::: {style="text-align: center; margin-top: 1em"}
![](images/chirps/home.png){height="500"}

[Download a CHRIPS data extract from ClimateSERV](https://climateserv.servirglobal.net/){
  preview-link="true" style="text-align: center"
}
:::


## What is a Raster File?

::: {.columns}
:::: {.column width="75%"}
![](images/chirps/files.png){.border .border-thick}
::::
:::: {.column width="25%"}
::::: {.fragment .fade-in-then-semi-out}
When you open your download, you'll find *one file per day* in our selected space and time period. 
:::::
::::: {.fragment .fade-in-then-semi-out}
The `.tif` file format is a high-resolution image. 
:::::
::::: {.fragment .fade-in-then-semi-out}
For CHRIPS, each pixel represents mm rainfall in an area 0.05 degrees longitude by 0.05 degrees latitude. 
:::::
::::
:::


## Load Raster Data into R

::: {.columns}
:::: {.column width="75%"}
The [terra](https://rspatial.github.io/terra/index.html) package reads Raster files. 

You *could* simply read the data from a single day, and map the result. 

```{r}
june5_2020 <- rast("data/chirps/20200605.tif")
```

```{r, echo=FALSE}
june5_2020
```
::::
:::: {.column width="15%"}
![](images/hex/terra.png){.absolute right=0 top=75 height=200}
::::
:::
::: {.fragment}
This output summaries the `.tif` file for June 5, 2020. Notice that there are:

  * 115 rows of pixels 
  * 159 columns of pixels
  * 1 *layer* named `20211028`
:::


## June 5, 2020 {auto-animate=true}

```{r, fig.align='center', eval=FALSE}
ggplot() + 
  layer_spatial(june5_2020) + 
  layer_spatial(shape, alpha = 0) + 
  theme_minimal() 
```

## June 5, 2020 {auto-animate="true" visibility="uncounted"}

```{r, fig.align='center'}
ggplot() + 
  layer_spatial(june5_2020) + 
  layer_spatial(shape, alpha = 0) + 
  theme_minimal() 
```

## June 5, 2020 {auto-animate="true" visibility="uncounted"}

```{r, `code-line-numbers`="2", fig.align='center'}
ggplot() + 
  layer_spatial(mask(june5_2020, vect(shape), touches = FALSE)) + 
  layer_spatial(shape, alpha = 0) + 
  theme_minimal() 
```

## June 5, 2020 {auto-animate="true" visibility="uncounted"}

```{r, `code-line-numbers`="5-7", fig.align='center'}
ggplot() + 
  layer_spatial(mask(june5_2020, vect(shape), touches = FALSE)) + 
  layer_spatial(shape, alpha = 0) + 
  theme_minimal() + 
  scale_fill_gradient2(low = "transparent", high = "#4375B7", na.value = "transparent") + 
  labs(fill = "Rainfall (mm)")
```


## Aggregation 

::: {.fragment}
Ultimately, you'll want to work with `.tif` images from *many* days at the same time. 
:::
::: {.fragment}
For this workshop, I've removed all files outside of the range June 1 to Oct 1.
:::
::: {.fragment}
```{r}
years <- map(2001:2020, ~{
  list.files("data/chirps/", pattern = paste0("^", .x), full.names = TRUE)
})

years <- set_names(years, 2001:2020) 
```
:::
::: {.fragment}
```{r, echo = FALSE}
years
```
:::


## Raster Layers 

::: {.fragment}
We now create one **multi-layered** raster for each year, and store all years in one large list. 

```{r}
years <- years %>% map(~.x %>% rast)
```
:::
::: {.fragment}
For example, each of the 122 days from the `2020` rainy season are now layered together in a single raster. 
:::
::: {.fragment}
```{r}
years$`2020`
```
:::


## Seasonal Rainfall Accumulation

::: {.fragment}
We could now use `sum` to add all of the daily rainfall totals from the `2020` rainy season. This reduces the number of layers from 122 to 1.
:::
::: {.fragment}
```{r}
years$`2020` %>% sum()
```
:::
::: {.fragment}
More efficiently, we'll apply the same `sum` function to *every* year in our list. Let's call this output `chirps_seasonal_sum`.
:::
::: {.fragment}
```{r}
chirps_seasonal_sum <- map(years, ~.x %>% sum)
```
```{r, echo=FALSE}
rm(years)
chirps_seasonal_sum
```
:::


## One Layer per Year 

::: {.fragment}
Now that each year in our list contains only one layer each, we can generate a *new* multi-layered raster: one layer *per year*. 
:::
::: {.fragment}
```{r}
chirps_seasonal_sum <- rast(chirps_seasonal_sum)
```
::: 
:::{.fragment}
```{r, echo = FALSE}
chirps_seasonal_sum
```
:::
:::{.fragment}
For every 0.05 degrees lat by 0.05 degree lon, we now have the **total seasonal rainfall accumulation** for every year 2001-2020.
:::


## How rainy was 2020? {auto-animate=true}
:::{.fragment}
To answer this question, we'll compare total seasonal accumulation in 2020 to the 20-year average.
:::
:::{.fragment}
Here is the *average* seasonal rainfall accumulation for each pixel (across all years):

```{r}
chirps_avg <- mean(chirps_seasonal_sum)
```
:::
:::{.fragment}
And here is the *standard deviation* from that average: 

```{r}
chirps_sd <- stdev(chirps_seasonal_sum)
```
:::
:::{.fragment}
Finally we can use both to compute a Z-score for each pixel in each year. 

```{r}
chirps_z <- (chirps_seasonal_sum - chirps_avg) / chirps_sd
```
:::
:::{.fragment}
```{r, echo=FALSE}
chirps_z
rm(chirps_sd)
rm(chirps_avg)
rm(chirps_seasonal_sum)
rm(june5_2020)
```
:::


## 2020 Seasonal Rainfall Accumulation 

Total accumulation relative to local 20-year average

```{r, fig.align='center', `code-line-numbers` = "2"}
ggplot() + 
  layer_spatial(mask(chirps_z$`2020`, vect(shape), touches = FALSE)) + 
  layer_spatial(shape, alpha = 0) +
  theme_minimal() + 
  scale_fill_gradient2(low = "#D72B22", high = "#4375B7", na.value = "transparent") + 
  labs(fill = "Z-score (2001-2020)")

```


## Questions about CHIRPS raster data? 


# 5 - Bringing it All Together 


## Spatially Weighted Averages

::: {.columns}
:::: {.column width=60%"}
::::: {.fragment}
Remember our example EA? It overlaps 8 CHIRPS pixels.

```{r, results='hide'}
test_buffer <- ea_summary_gps %>% 
  filter(ea == test_ea)
```
:::::
::::: {.fragment}
```{r, eval = FALSE}
ggplot() + 
  layer_spatial(
    chirps_z$`2020` %>% 
      crop(test_buffer, snap = "out")
  ) + 
  layer_spatial(test_buffer, alpha = 0) + 
  theme_minimal() + 
  scale_fill_gradient2(
    high = "#4375B7", 
    limits = c(0, 2)
  )
```
:::::
::::
:::: {.column width="40%"}
::::: {.fragment}
```{r, echo=FALSE, fig.height=7, fig.width=7}
ggplot() + 
  layer_spatial(chirps_z$`2020` %>% crop(test_buffer, snap = "out")) + 
  layer_spatial(test_buffer, alpha = 0) + 
  theme_minimal() + 
  scale_fill_gradient2(high = "#4375B7", limits = c(0, 2))
```
:::::
::::
:::

## Spatially Weighted Averages

::: {.columns}
:::: {.column width=60%"}
::::: {.fragment}
We want a spatial average of these 8 pixels, where the "weight" is the proportion of the pixel included in the buffer zone.
:::::
::::: {.fragment}
```{r}
test_z <- chirps_z$`2020` %>%
  extract(vect(test_buffer), weights = TRUE)
```

```{r, echo=FALSE}
test_z
```
:::::
::::: {.fragment}
```{r}
test_z %>% summarise(
  wtd_mean = weighted.mean(`2020`, weight)
)
```
:::::
::::
:::: {.column width="40%"}
```{r, echo=FALSE, fig.height=7, fig.width=7}
ggplot() + 
  layer_spatial(chirps_z$`2020` %>% crop(test_buffer, snap = "out")) + 
  layer_spatial(test_buffer, alpha = 0) + 
  theme_minimal() + 
  scale_fill_gradient2(high = "#4375B7", limits = c(0, 2))
```
::::
:::

 
## Enumeration Area Z-scores {auto-animate="true" visibility="uncounted"}

::: {.fragment}
Link by row ID

```{r}
ea_summary_gps <- ea_summary_gps %>% rowid_to_column("ID")
```
:::
::: {.fragment}
```{r, echo=FALSE}
ea_summary_gps
```
:::

## Enumeration Area Z-scores {auto-animate="true" visibility="uncounted"}

::: {.fragment}
Calculate one z-score per row ID

```{r}
ea_summary_z <- chirps_z$`2020` %>% 
  extract(vect(ea_summary_gps), weights = TRUE) %>% 
  tibble() %>% 
  group_by(ID) %>% 
  summarise(z = weighted.mean(`2020`, weight)) 
```
:::
::: {.fragment}
```{r, echo=FALSE}
ea_summary_z
```
:::

## Enumeration Area Z-scores {auto-animate="true" visibility="uncounted"}
::: {.fragment}
... and join z-scores back with the `ea` codes for each enumeration area

```{r}
ea_summary_z <- ea_summary_gps %>% 
  st_drop_geometry() %>% 
  select(ID, ea) %>% 
  full_join(ea_summary_z, by = "ID")
```
:::
::: {.fragment}
```{r, echo=FALSE}
ea_summary_z
```
:::


## Merging Z-scores with PMA data {auto-animate="true"}

::: {.fragment}
We'll use `transmute` to select a handful of variables from `pma` to use in our analysis.
:::
::: {.fragment}
```{r}
pma_chirps <- pma %>% 
  transmute(
    ea = EAID_1, 
    wt = PANELWEIGHT,
    birth = PANELBIRTH_2 == 1,
    urban = URBAN == 1
  ) 
```
:::
::: {.fragment}
```{r, echo=F}
pma_chirps
knitr::opts_chunk$set(R.options = list(width = 150))
```
:::

## Merging Z-scores with PMA data {auto-animate="true"}

We'll use `transmute` to select a handful of variables from `pma` to use in our analysis.

```{r}
pma_chirps <- pma %>% 
  transmute(
    ea = EAID_1, 
    wt = PANELWEIGHT,
    birth = PANELBIRTH_2 == 1,
    urban = URBAN == 1
  ) 
```

::: {.fragment}
Another `full_join` allows us to merge `ea_summary_z` for each woman's enumeration area.  
:::
::: {.fragment}
```{r}
pma_chirps <- pma_chirps %>% full_join(ea_summary_z, by = "ea") 
```
:::
::: {.fragment}
```{r, echo=F}
pma_chirps
```
:::


## Z-scores in hierarchical models

::: {.fragment}
The `lme4` package allows us to build a hierarchical model with EA random effects. 

For example, a simple model interacting rainfall with `urban`: 
:::
::: {.fragment}
```{r}
glmer(
  birth ~ z*urban + (1 | ea), 
  data = pma_chirps,
  family = "binomial") %>%
  tidy(exp = TRUE, conf.int = TRUE) %>%
  mutate(sig = gtools::stars.pval(p.value))
```

No controls, but we do so see some evidence that a wetter rainy season might increase odds for a `birth` that year.
:::


## Survey weights 

::: {.fragment}
`lme4`  does not provide a nice way to incorporate survey weights (inverse selection probability). 

Instead, you might consider the `survey` package for weights and in a logit model with cluster-robust SEs.
:::
::: {.fragment}
```{r}
pma_chirps %>%
  as_survey_design(weight = wt, id = ea) %>%
  svyglm(birth ~ z*urban, design = ., family = "quasibinomial") %>%
  tidy(exp = TRUE, conf.int = TRUE) %>%
  mutate(sig = gtools::stars.pval(p.value))
```

Similar results: some evidence that a wetter rainy season might increase odds for a `birth` that year.
:::


## Some other things to consider 

* When does rainfall occur relative to conception? 
* What is the relevant comparison period? (CHRIPS data are available from 1981)
* Does the rainy season vary across the study area? 
* How does rainfall interact with temperature, vegetation, and local livelihoods?

::: {.fragment}
Technical issues: 

* Should I store climate data locally, or access through a server? (Check out the [CHIRPS API](https://docs.ropensci.org/chirps/) package for R!)
* How much memory does R need to process my analysis? Can I break it up into pieces, or use parallel processing? 
:::

## Resources 

* [r-spatial](https://r-spatial.org/) - blog from the authors of [sf](https://r-spatial.github.io/sf/index.html)
* [rspatial](https://rspatial.org/terra/index.html) - blog from the authors of [terra](https://rspatial.github.io/terra/index.html)
* [PMA Data Analysis Hub](https://tech.popdata.org/pma-data-hub/) - our blog! 
* [IPUMS.org](https://www.ipums.org/) (and [IPUMS Global Health](https://globalhealth.ipums.org/) in particular)
* Find us on Twitter: [@ipumsGH](https://twitter.com/ipumsGH)

::: {.fragment}
Thank you! 
:::

```{r, echo=FALSE, eval=FALSE}
# Weighted with EA random effects ??? Does not match Stata 
pma_chirps2 <- parameters::rescale_weights(pma_chirps, "ea", "wt")
glmer(
  birth ~ Z*urban + (1 | ea), 
  weights = pweights_a,
  data = pma_chirps2,
  family = "binomial") %>%
  tidy(exp = TRUE, conf.int = TRUE) %>%
  mutate(sig = gtools::stars.pval(p.value))
```


```{r, echo = FALSE, eval=FALSE}
# Custom font 
library(showtext)
sysfonts::font_add(
  family = "cabrito", 
  regular = "local/fonts/cabritosansnormregular-webfont.ttf"
)
showtext::showtext_auto()

ggplot() + 
  layer_spatial(mask(chirps_z$`2020`, vect(shape), touches = FALSE)) + 
  layer_spatial(shape, alpha = 0) + 
  layer_spatial(
    ea_summary_gps %>% 
      st_centroid() %>% 
      mutate(many_births = if_else(
        many_births,
        "Above Median",
        "Below Median"
      )),
    aes(shape = many_births), alpha = 0.6, size = 2
  ) + 
  theme_minimal() %+replace% 
  theme(
    text = element_text(family = "cabrito", size = 13),
    plot.title = element_text(size = 22, color = "#00263A",
    hjust = 0, margin = margin(b = 5)),
    plot.subtitle = element_text(hjust = 0, margin = margin(b = 10)),
    strip.background = element_blank(),
    strip.text.y = element_text(size = 16, angle = 0),
    panel.spacing = unit(1, "lines"),
    axis.title.y = element_text(angle = 0, margin = margin(r = 10))
  ) +  
  scale_fill_gradient2(
    low = "#D72B22", high = "#4375B7", na.value = "transparent"
  ) + 
  labs(
    fill = "Z-score \n(2001-2020)",
    shape = "Enumeration Area \nBirth Rate",
    title = "2020 Births vs Seasonal Rainfall Accumulation in Burkina Faso",
    subtitle = "Total rainfall June 1 to October 1 vs mean seasonal total 2001-2020"
  ) 

```