-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathipums1_2020_2021.Rmd
448 lines (353 loc) · 18.7 KB
/
ipums1_2020_2021.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
---
title: "Labor Force Participation and Employment of Men and Women in the U.S. During the COVID-19 Pandemic"
author: "Elena Stolpovsky"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
html_document:
# toc: true
# toc_depth: 5
code_folding: hide
bibliography: references.bib
link-citations: true
nocite: |
@Stolpovsky_geography, @Stolpovsky_historical_unemployment
---
<base target="_top"/>
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
message = FALSE,
warning = FALSE)
options(scipen = 999,
digits = 2,
tibble.print_max = Inf)
```
[<font size="4">EconBlog</font>](https://elenas70.github.io/econblog/)
[<font size="4">`r icons::fontawesome("github")`</font>](https://github.com/elenas70/us_labor_market_and_COVID)
[<font size="4">`r icons::fontawesome("linkedin")`</font>](https://www.linkedin.com/in/estolpovsky/)
### Changes in the U.S. Labor Market in 2020–2021
The COVID-19 pandemic led to a decline in economic activity, a fall in demand for labor and a decline in the ability and willingness of people to work in the market. These changes are reflected in high unemployment and reduced labor force participation.
Unemployment peaked in April 2020, with male unemployment rate at 13% and that of women at 16%. There was great variability in unemployment rates by state, ranging from 6% in Wyoming to 30% in Nevada ([2021](https://rpubs.com/elenas70/unemployment_by_state)). Labor force participation rates fell by 3 percentage points between February and April 2020, with female labor force participation rate dropping to 55%, and that of men, to 66%.
Unemployment rates went down to 7% by October 2020, and were at 6–7% in March 2021. Similarly, labor force participation rates recovered 2 of the 3 percentage points lost in April 2020 by October 2020, and remained relatively constant after October 2020.
<!-- For a historical analysis of labor force participation by gender based on see [Labor Force Participation in the U.S. 1976-2000. Why out of the Labor Force?]()-->
### Data and Methodology
The data for this analysis come from the [Current Population Survey](https://cps.ipums.org/cps/) (CPS). CPS is a monthly survey run by the Census Bureau with questions about demographic and economic characteristics of the U.S. population. CPS is used to calculate the monthly federal statistics on unemployment. I obtain the data from the Integrated Public Use Microdata Series database [-@ipums2020].
My sample consists of CPS monthly core employment data for January 2020–March 2021. Each sample includes individual and household weights, which allow inferences about the population from the samples. I discuss CPS weights and examine the weight distribution in the [Appendix](#weights).
```{r include=FALSE}
library(ipumsr)
ddi <- read_ipums_ddi("cps_00018.xml")
d <- read_ipums_micro(ddi)
```
```{r}
library(sjlabelled) #package to remove data labels. Labels slow down analysis
library(tidyverse)
library(kableExtra)
library(formattable)
d <- d %>%
remove_all_labels() %>% filter(EMPSTAT != 1) %>% #excluding military personnel
select(-CPSID,-CPSIDP) %>% mutate(
month = month.abb[MONTH],
year_month = YEAR * 100 + MONTH,
famid = SERIAL + YEAR * 10000 + MONTH / 10,
#family id is is unique for each family record.
id = SERIAL + YEAR * 10000 + MONTH / 10 + PERNUM / 1000,
#id is unique for each individual record
sex = ifelse(SEX == 1, "men", "women"),
married_spouse_present = (MARST == 1) * 1,
married = (MARST %in% c(1, 2)) * 1,
unemployed = (EMPSTAT %in% c(20, 21, 22)) * 1,
employed = (EMPSTAT %in% c(1, 10, 12)) * 1,
retired = (EMPSTAT == 36) * 1,
lfp = (LABFORCE == 2) * 1,
age16plus = (AGE > 15) * 1,
workingage = (AGE < 63 &
AGE > 17) * 1
) %>% select(-MARST, -EMPSTAT, -LABFORCE, -DIFFANY,-ASECFLAG)
months <- length(unique(d$year_month))
individuals <- nrow(d)
families <- length(unique(d$famid))
```
The `r months` monthly surveys contain in total about `r format(round(individuals,-3), big.mark=",")` observations of individuals, or about `r format(round(individuals/months,-3), big.mark=",")` individuals per month. The observations of individuals are part of about `r format(round(families,-3), big.mark=",")` observations of households, or about `r format(round(families/months,-3), big.mark=",")` households per month.
The Bureau of Labor Statistics defines **labor force participation rate** as a percentage of civilian noninstitutional population age 16 or older that is in the labor force. **Unemployment rate** is the percentage of those in the labor force that are unemployed.
The unemployment rate is an imperfect indicator of the effect of a social or economic shock on employment, because the denominator of the rate, the number of people in the labor force, is also affected by the shock. I calculate an alternative measure, **employment rate**, or the percentage of the adult civilian noninstitutional population that is employed.
The population labor market rates, standard errors and confidence intervals are calculated using the R package [srvyr](https://cran.r-project.org/web/packages/srvyr/srvyr.pdf) [@srvyr2020]. I use the Horvitz-Thompson estimator [@lamley2010 p. 5, 221–22)] to compute population-level statistics as weighted sample means and standard errors of the mean.
### Labor Market Impacts of COVID{#stats}
Labor force participation rate fell from 63% in February 2020 to 60% in April 2020 and recovered to 62% by March 2021. Employment rate lost 10 percentage points between February and April 2020, falling from 61% to 51% of adult population. Employment rate mostly recovered, reaching 58% in March 2021. Unemployment rate was 4% in February 2020, peaked at 15% in April 2020 and came down to 6% by March 2021.
```{r}
library(ggplot2)
library(plotly)
library(withr)
library(zoo)
library(xts)
library(lubridate)
library(stringr)
library(ggthemes)
library(srvyr)
survey <-
as_survey(d, weights = c(WTFINL)) %>% filter(age16plus == 1) # represent data as a survey with sampling weights and exclude individuals under 16
rates_monthly <-
survey %>% group_by(YEAR, MONTH) %>% summarize(lfp_rate = survey_mean(lfp),
employment_rate = survey_mean(employed))
#standard errors are the standard errors of the sample mean that account for the uncertainty of each observation (weights away from 1 mean that the observations deviation from the mean increases the uncertainty, increasing se).
#The variance attains its maximum value, when all weights except one are zero. Its minimum value is found when all weights are equal (i.e., unweighted mean), in which case it degenerates into the standard error of the mean, squared.https://en.wikipedia.org/wiki/Weighted_arithmetic_mean
unemployment_monthly <-
survey %>% filter(lfp == 1) %>% as_survey(weights = c(WTFINL)) %>% group_by(YEAR, MONTH) %>% summarize(unemployment_rate =
survey_mean(unemployed))
survey_monthly <-
left_join(rates_monthly, unemployment_monthly, by = c("YEAR","MONTH"))
survey_monthly[, -c(1:2)] <-
lapply(survey_monthly[, -c(1:2)], percent, 1)#format all columns except the Year and Month as percentages
```
```{r}
library(DT)
tab <-
survey_monthly %>% mutate(
YEAR = as.integer(YEAR),
MONTH = MONTH,
month = month.abb[MONTH],
lfp_rate = round(as.numeric(lfp_rate), 3),
lfp_rate_se = round(as.numeric(lfp_rate_se), 3),
employment_rate = round(as.numeric(employment_rate), 3),
employment_rate_se = round(as.numeric(employment_rate_se), 3),
unemployment_rate = round(as.numeric(unemployment_rate), 3),
unemployment_rate_se = round(as.numeric(unemployment_rate_se), 3)
) %>% relocate(month, .after = MONTH)
DT::datatable(
tab,
caption = "Labor Force Participation, Employment and Unemployment Rates in 2020–2021",
colnames = c(
"Year",
"Month Number",
"Month",
"Labor Force Participation Rate",
"Std. Error",
"Employment Rate",
"Std. Error",
"Unemployment Rate",
"Std. Error"
),
rownames = FALSE,
filter = "top",
extensions = 'Buttons',
options = list(dom = 'Blrtip',
buttons = c('copy', 'csv', 'excel'))
) %>% formatPercentage(
c(
"lfp_rate",
"lfp_rate_se",
"employment_rate",
"employment_rate_se" ,
"unemployment_rate",
"unemployment_rate_se"
),
1
)
```
*Source: Current Population Survey [-@ipums2020]*\
Estimates and standard errors are calculated using estimation methods for survey data with sampling weights [@lamley2010; @srvyr2020].
### The Differences in Labor Market Statistics by Sex
In the beginning of 2020 male labor force participation rate was 11 percentage points higher than that of women. This gap did not change throughout 2020–2021, as the labor force participation rates for both sexes dropped 3 percentage points in April 2020 and recovered 2 percentage points by March 2021.
The 2020 recession is unique in recent history in that it led to a greater unemployment shock for women than for men. Past U.S. recessions since the 1970s were characterized by higher peak unemployment rates among men ([2021](https://rpubs.com/elenas70/historical_unemployment)). High unemployment and low labor force participation rates for women combined in April 2020 in a labor market where only 46% of adult women were employed.
Female employment rate was 53% in March 2021, 10 percentage points below male employment rate. The gap is explained partially by the historical gap in labor force participation rates between sexes. In addition, in a pandemic when families have to balance careers with caring for children and elderly, compromises may include exits from the labor market by lower earners and traditional caregivers, typically women.
```{r}
library(ggplot2)
library(plotly)
library(withr)
library(zoo)
library(xts)
library(lubridate)
library(stringr)
library(ggthemes)
library(srvyr)
rates_monthly <- survey %>% group_by(YEAR, MONTH, sex) %>% summarize(lfp_rate=survey_mean(lfp, vartype = "ci"),
employment_rate=survey_mean(employed, vartype = "ci")) %>%
pivot_wider(names_from=sex, values_from=c(lfp_rate,lfp_rate_low,lfp_rate_upp,employment_rate,employment_rate_low,employment_rate_upp))
unemployment_monthly <- survey %>% filter(lfp==1) %>% group_by(YEAR, MONTH, sex) %>% summarize(unemployment_rate=survey_mean(unemployed, vartype = "ci")) %>%
pivot_wider(names_from=sex, values_from=c(unemployment_rate,unemployment_rate_low,unemployment_rate_upp))
survey_monthly <- left_join(rates_monthly, unemployment_monthly,by=c("YEAR", "MONTH"))
survey_monthly[,-c(1,2)] <- lapply(survey_monthly[,-c(1,2)],percent,0)#format all columns except the Month Number as percentages
survey_monthly$Time <- seq(from=as.Date("2020/1/1"), to=as.Date("2021/3/1"), by="month") %>% as.yearmon() #Need to change the "to" date as new data becomes available
p <-
ggplot(survey_monthly, aes(Time)) + geom_line(aes(
y =lfp_rate_men,
linetype = "Labor force participation rate",
color = "men"
))+ geom_ribbon(aes(ymax=lfp_rate_upp_men,ymin=lfp_rate_low_men), alpha=0.2)+
geom_line(aes(
y =lfp_rate_women,
linetype = "Labor force participation rate",
color = "women"
)) + geom_ribbon(aes(ymax=lfp_rate_upp_women,ymin=lfp_rate_low_women), alpha=0.2)+
#employment rate
geom_line(aes(
y =employment_rate_men,
linetype = "Employment rate",
color = "men"
))+ geom_ribbon(aes(ymax=employment_rate_upp_men,ymin=employment_rate_low_men), alpha=0.2)+
geom_line(aes(
y =employment_rate_women,
linetype = "Employment rate",
color = "women"
)) + geom_ribbon(aes(ymax=employment_rate_upp_women,ymin=employment_rate_low_women), alpha=0.2)+
geom_line(aes(
y =unemployment_rate_men,
linetype = "Unemployment rate",
color = "men"
))+ geom_ribbon(aes(ymax=unemployment_rate_upp_men,ymin=unemployment_rate_low_men), alpha=0.2)+
geom_line(aes(
y =unemployment_rate_women,
linetype = "Unemployment rate",
color = "women"
)) + geom_ribbon(aes(ymax=unemployment_rate_upp_women,ymin=unemployment_rate_low_women), alpha=0.2)+
labs(title = "Labor Market Rates for Men and Women in 2020–2021") +
scale_y_continuous('Rate', labels = scales::percent_format()) +
theme(legend.title = element_blank())
ggplotly(p,
tooltip = c("y", "x","ymax","ymin"),
height = 500,
width = 800)
```
The gray band represents the 95% confidence intervals.
```{r}
rates_monthly <-
survey %>% group_by(YEAR, MONTH, sex) %>% summarize(lfp_rate = survey_mean(lfp), employment_rate = survey_mean(employed)) %>%
pivot_wider(
names_from = sex,
values_from = c(lfp_rate, lfp_rate_se, employment_rate, employment_rate_se)
)
unemployment_monthly <-
survey %>% filter(lfp == 1) %>% group_by(YEAR, MONTH, sex) %>% summarize(unemployment_rate = survey_mean(unemployed)) %>%
pivot_wider(
names_from = sex,
values_from = c(unemployment_rate, unemployment_rate_se)
)
survey_monthly <-
left_join(rates_monthly, unemployment_monthly, by = c("MONTH","YEAR")) %>% select(
lfp_rate_men,
lfp_rate_se_men,
lfp_rate_women,
lfp_rate_se_women,
employment_rate_men,
employment_rate_se_men,
employment_rate_women,
employment_rate_se_women,
unemployment_rate_men,
unemployment_rate_se_men,
unemployment_rate_women,
unemployment_rate_se_women
)
survey_monthly[, -1] <- lapply(survey_monthly[, -1], percent, 1)
```
```{r}
tab <-
survey_monthly %>% mutate(
YEAR = as.integer(YEAR),
MONTH = as.integer(MONTH),
month = month.abb[MONTH],
lfp_rate_men = round(as.numeric(lfp_rate_men), 3),
lfp_rate_se_men = round(as.numeric(lfp_rate_se_men), 3),
lfp_rate_women = round(as.numeric(lfp_rate_women), 3),
lfp_rate_se_women = round(as.numeric(lfp_rate_se_women), 3),
employment_rate_men = round(as.numeric(employment_rate_men), 3),
employment_rate_se_men = round(as.numeric(employment_rate_se_men), 3),
employment_rate_women = round(as.numeric(employment_rate_women), 3),
employment_rate_se_women = round(as.numeric(employment_rate_se_women), 3),
unemployment_rate_men = round(as.numeric(unemployment_rate_men), 3),
unemployment_rate_se_men = round(as.numeric(unemployment_rate_se_men), 3),
unemployment_rate_women = round(as.numeric(unemployment_rate_women), 3),
unemployment_rate_se_women = round(as.numeric(unemployment_rate_se_women), 3)
) %>% relocate(month, .after = MONTH)
DT::datatable(
tab,
caption = "Labor Market Statistics by Sex in 2020–2021",
colnames = c(
"Year",
"Month Number",
"Month",
"Lfp Rate Men",
"Std. Error",
"Lfp Rate Women",
"Std. Error",
"Employment Rate Men",
"Std. Error",
"Employment Rate Women",
"Std. Error",
"Unemp Rate Men",
"Std. Error",
"Unemp Rate Women",
"Std. Error"
),
rownames = FALSE,
filter = "top",
extensions = 'Buttons',
options = list(dom = 'Blrtip',
buttons = c('copy', 'csv', 'excel'))
) %>% formatPercentage(
c(
"lfp_rate_men",
"lfp_rate_se_men",
"lfp_rate_women",
"lfp_rate_se_women",
"employment_rate_men",
"employment_rate_se_men" ,
"employment_rate_women",
"employment_rate_se_women" ,
"unemployment_rate_men",
"unemployment_rate_se_men",
"unemployment_rate_women",
"unemployment_rate_se_women"
),
1
)
```
*Source: Current Population Survey [-@ipums2020]*\
Estimates and standard errors are calculated using estimation methods for survey data with sampling weights [@lamley2010; @srvyr2020].
In [the next part of the analysis of the U.S. labor market during the COVID pandemic](https://rpubs.com/elenas70/unemployment_by_state) I examine the geographic distribution of unemployment and discuss the reasons why some regions have been impacted more and have been slower to recover than others.
### Appendix: Current Population Survey Weights {#weights}
All individuals and households surveyed are assigned weights to reflect the fact that some records represent more cases in the population than others. The weights are sampling weights. They are based on the inverse probabilities of selection into the sample, and depend on the known demographic distribution of the population and other factors such as nonresponse. The weights are comparable over time.
In this section I normalize the individual weights so that in the 2020 sample the relatively underrepresented observations have a weight above 1, and overrepresented observations have a weight below 1. Note that standardization is not necessary for calculating the labor market statistics.
```{r}
d <- d %>% mutate(iweight = WTFINL * n() / sum(WTFINL))
```
The average of the wights for the full sample is 1. The normalized individual weights are calculated as
$\large \text{normalized weight}=\text{weight} \cdot \frac{\text{number of indivividual observations}}{\sum_i{\text{weight}}}$.
The resulting weights range from `r round(min(d$iweight),2)` to `r round(max(d$iweight),2)`.
The histogram of the weights shows that weights are clustered around a lower peak of 0.16 and around a higher peak of 1.22.
```{r}
p <-
ggplot(d, aes(x = iweight)) + geom_histogram(
bins = 50,
color = "grey69",
fill = "blue",
alpha = 0.5
) + xlim(0, 4) +
ggtitle("Distribution of the Record Weights") + xlab("Individual Weight") +
scale_y_continuous(
"Number of Records",
labels = function(x)
format(x, big.mark = ",")
)
ggplotly(p, height = 300, width = 600)
```
*Source: Current Population Survey [-@ipums2020]*
```{r}
tab <-
d %>% group_by(YEAR, MONTH) %>% summarize(observations=format(n(), big.mark = ","),weight= round(mean(iweight), 2)) %>%
mutate(YEAR = as.integer(YEAR),
MONTH = as.integer(MONTH), month =month.abb[MONTH]
) %>% relocate(month, .after = MONTH)
DT::datatable(
tab,
caption = "Number of Individual Records and Record Weights",
colnames = c("Year","Month Number","Month", "Number of Individuals", "Average Normalized Weight"),
rownames = FALSE,
filter = "top",
extensions = 'Buttons',
options = list(dom = 'Blrtip',
buttons = c('copy', 'csv', 'excel'))
)
```
*Source: Current Population Survey [-@ipums2020]*
In April - June 2020 CPS data was collected exclusively by phone and response rates fell. Starting July 2020 in-person interviews began in some areas, and in September 2020 they expanded to all areas. Sample sizes were lower in March–August 2020, and the responses during these months were weighted higher.
The next part of this analysis: [The Geography of Unemployment in 2020–2021](https://rpubs.com/elenas70/unemployment_by_state)
<a href="#top">Back to top</a>
### References