The standard difference-in-difference method returns an unbiased estimate under three key assumptions:
- parallel trends (PT) assumption, also known as common trend (CT) assumption;
- no anticipation (NA) assumption;
- homogeneous treatment effect assumption.
Sadly, these assumptions do not hold strictly in many empirical settings. Due to these sad phenomena, nowadays a lot of econometricians are working on relaxing these assumptions. In this repository, my shooting target is the parallel trends assumption.
It is hard to find parallel trends in this non-parallel world.
Please see my another repository (DID Handbook) if you are interested in heterogeneity-robust DID estimators (which relax the homogeneity assumption).
In a two-period DID setting where some units get treated in the second period, the parallel trends can be formalized as
In a staggered DID setting (i.e., different units get treated in different periods), the parallel trends can be formalized as
The parallel trends (PT) assumption is testable if we observe more than two periods.
The most popular way of assessing the PT assumption was visualizing the averages of observed outcomes. That is, researchers plot the average outcomes by group and across time periods, and then check whether the lines look approximately parallel. In math words, this method is equivalent to using human's naked eyes (or eyes with glasses) to test
where
The visualization method is tiresome in staggered DID setting. In such cases, researchers had ever turned to a simple event-study specification (usually with a plot reporting results). The specification has the following form:
Unfortunately, neither method provides a good assessment of the PT assumption. The key reason is that they focus on the pre-treatment trends, but our interest is the trends over all time periods (including pre- and post-treatment periods). An interesting instance against the traditional PT test is given by Roth et al. (2023):
[T]he average height of boys and girls evolves in parallel until about age 13 and then diverges, but we should not conclude from this that there is a causal effect of bar mitzvahs (which occur for boys at age 13) on children's height!
Other reasons include:
- The assessment based on the visualization of average outcomes depend on users' sense of sight. Plotting on empirical data often shows two lines that look imperfectly parallel. Then, how to assess?
- The assessment based on an event-study regression in staggered setting is potentially biased because each coefficient estimate is contaminated by the cohort-specific ATT in other periods. See Sun & Abraham (2021) for details.
Therefore, we hunger for some more robust methods of assessing the PT assumption. More importantly, if the PT assumption really doesn't hold, we also hunger for some ways to do causal inference under its relaxed version.
Rambachan & Roth (2023) relax the PT assumption by imposing some restrictions on the post-treatment differences in trends. They further provide two inference procedures (conditional & hybrid confidence intervals and fixed length confidence intervals) that are valid under their specified restrictions.
Their approaches are under the assumption that the coefficient of interest
where
Rambachan and Roth then provide some choices of restriction sets
- If researchers believe that the magnitude of the differential shocks to treated and control groups in the post-treatment period is not too different from the magnitude in the pre-treatment period, then a reasonable restriction set is
$$\delta \in \Delta^{RM}(M) := \left\{ \delta: \forall t \geq 0, |\delta_{t+1} - \delta_t| \leq M \cdot \max_{s < 0} |\delta_{s+1} - \delta_s| \right\}$$ where$RM$ is the abbreviation for "relative magnitude", and$M \geq 0$ is a number specified by researchers. For example,$M = 1$ bounds the largest post-treatment difference in trends by the equivalent maximum in the pre-treatment period. - If researchers believe that the slope of the difference in trends varies smoothly across consecutive periods, then a reasonable restriction set is
$$\delta \in \Delta^{SD}(M) := \left\{ \delta: \forall t, |(\delta_{t+1} - \delta_t) - (\delta_t - \delta_{t-1})| \leq M \right\}$$ where$SD$ is the abbreviation for "second derivative" ($M \geq 0$ restricts the amount by which the slope of$\delta$ can change across consecutive periods, so equivalently it restricts the second derivative). As above,$M$ is a number specified by researchers; for example, if$M = 0$ , then the difference in trends is restricted to be exactly linear.
Rambachan and Roth recommend that researchers should
- Construct confidence intervals under reasonable restrictions on the violations of PT assumption, in which the set
$\Delta$ should be motivated by domain knowledge in empirical settings. - Conduct sensitivity analyses to show how the estimated causal effect is sensitive to alternative restrictions.
- Report the breakdown value of
$M$ at which the estimated causal effect is no longer significant.
The sensitivity analyses can be done by the HonestDiD
package (written by Ashesh Rambachan at MIT) in R or honestdid
package (written by Ashesh Rambachan, Mauricio Caceres Bravo at Brown University, and Jonathan Roth at Brown University) in Stata.
To install the latest version of HonestDiD
in R, please run the following codes.
install.packages("remotes") # if you haven't installed this package
Sys.setenv("R_REMOTES_NO_ERRORS_FROM_WARNINGS" = "true")
remotes::install_github("asheshrambachan/HonestDiD")
See here for an R coding example, and here is my update with more comments. Researchers can show the results of a sensitivity analysis easily by a plot as below.
Looking at the figure above, we can easily find that the breakdown value for a significant effect is about 2.0. This figure provides strong evidence for the significance of the estimated causal effect
It is hopeful that Rambachan & Roth (2023)'s sensitivity analyses can be combined with Callaway & Sant'Anna (2021)'s heterogeneity-robust DID method. See here for an example showing how to combine them. However, note that this combination is still a work in progress and as of now no theoretical papers have discussed its plausibility. Plus, there might be some errors in that self-defined function (since I found that it doesn't work well on Medicaid Expansion dataset when argument e
is set to 1 or larger).
Finally, here is a guideline for Stata users about the use of honestdid
package. Note that to plot confidence intervals by the honestdid
command, the coefplot
package must be installed. I tried to combine several heterogeneity-robust DID methods (implemented by csdid
, eventstudyinteract
, and did_multiplegt
in Stata) with the sensitive analyses (see here for my coding). The key point is that we need to provide two matrices (storing coefficient and variance estimates, respectively), b
and V
by default, for the honestdid
command.
I sincerely thank my former PhD cohort JaeSeok Oh for his assistance in writing this repository. Comments and corrections on my coding are always welcomed; you can report them by posting in Issues
or by contacting me privately by email.