From 57426b3a6c13ab051839c1add8f6f4fdfdbd7b9c Mon Sep 17 00:00:00 2001 From: Noah Greifer Date: Tue, 12 Nov 2024 18:42:26 -0500 Subject: [PATCH] Doc and vignette updates --- man/matchit.Rd | 36 +++++++++++---------------------- vignettes/assessing-balance.Rmd | 2 +- vignettes/matching-methods.Rmd | 2 +- vignettes/sampling-weights.Rmd | 4 ++-- 4 files changed, 16 insertions(+), 28 deletions(-) diff --git a/man/matchit.Rd b/man/matchit.Rd index 7ad2b399..d5976b61 100644 --- a/man/matchit.Rd +++ b/man/matchit.Rd @@ -71,8 +71,7 @@ used.} argument controlling the link function used in estimating the distance measure. Allowable options depend on the specific \code{distance} value specified. See \code{\link{distance}} for allowable options with each -option. The default is \code{"logit"}, which, along with \code{distance = "glm"}, identifies the default measure as logistic regression propensity -scores.} +option. The default is \code{"logit"}, which, along with \code{distance = "glm"}, identifies the default measure as logistic regression propensity scores.} \item{distance.options}{a named list containing additional arguments supplied to the function that estimates the distance measure as determined @@ -96,8 +95,7 @@ propensity scores. Usually used to perform Mahalanobis distance matching within propensity score calipers, where the propensity scores are computed using \code{formula} and \code{distance}. Can be specified as a string containing the names of variables in \code{data} to be used or a one-sided -formula with the desired variables on the right-hand side (e.g., \code{~ X3 + X4}). See the individual methods pages for information on whether and how -this argument is used.} +formula with the desired variables on the right-hand side (e.g., \code{~ X3 + X4}). See the individual methods pages for information on whether and how this argument is used.} \item{antiexact}{for methods that allow it, for which variables anti-exact matching should take place. Anti-exact matching ensures paired individuals @@ -180,7 +178,7 @@ additional arguments are allowed for each method.} When \code{method} is something other than \code{"subclass"}, a \code{matchit} object with the following components: -\item{match.matrix}{a matrix containing the matches. The rownames correspond +\item{match.matrix}{a matrix containing the matches. The row names correspond to the treated units and the values in each row are the names (or indices) of the control units matched to each treated unit. When treated units are matched to different numbers of control units (e.g., with variable ratio matching or @@ -196,13 +194,10 @@ the model used to estimate propensity scores when \code{distance} is specified as a method of estimating propensity scores. When \code{reestimate = TRUE}, this is the model estimated after discarding units.} -\item{X}{a data frame of covariates mentioned in \code{formula}, -\code{exact}, \code{mahvars}, \code{caliper}, and \code{antiexact}.} +\item{X}{a data frame of covariates mentioned in \code{formula}, \code{exact}, \code{mahvars}, \code{caliper}, and \code{antiexact}.} \item{call}{the \code{matchit()} call.} -\item{info}{information on the matching method and -distance measures used.} -\item{estimand}{the argument supplied to -\code{estimand}.} +\item{info}{information on the matching method and distance measures used.} +\item{estimand}{the argument supplied to \code{estimand}.} \item{formula}{the \code{formula} supplied.} \item{treat}{a vector of treatment status converted to zeros (0) and ones (1) if not already in that format.} @@ -210,16 +205,11 @@ distance measures used.} values (i.e., propensity scores) when \code{distance} is supplied as a method of estimating propensity scores or a numeric vector.} \item{discarded}{a logical vector denoting whether each observation was -discarded (\code{TRUE}) or not (\code{FALSE}) by the argument to -\code{discard}.} -\item{s.weights}{the vector of sampling weights supplied to -the \code{s.weights} argument, if any.} -\item{exact}{a one-sided formula -containing the variables, if any, supplied to \code{exact}.} -\item{mahvars}{a one-sided formula containing the variables, if any, -supplied to \code{mahvars}.} -\item{obj}{when \code{include.obj = TRUE}, an -object containing the intermediate results of the matching procedure. See +discarded (\code{TRUE}) or not (\code{FALSE}) by the argument to \code{discard}.} +\item{s.weights}{the vector of sampling weights supplied to the \code{s.weights} argument, if any.} +\item{exact}{a one-sided formula containing the variables, if any, supplied to \code{exact}.} +\item{mahvars}{a one-sided formula containing the variables, if any, supplied to \code{mahvars}.} +\item{obj}{when \code{include.obj = TRUE}, an object containing the intermediate results of the matching procedure. See the individual methods pages for what this component will contain.} When \code{method = "subclass"}, a \code{matchit.subclass} object with the same @@ -352,8 +342,7 @@ as the sum of the inverse of the number of control units matched to the same treated unit across its matches. For example, if a control unit was matched to a treated unit that had two other control units matched to it, and that same control was matched to a treated unit that had one other control unit -matched to it, the control unit in question would get a weight of 1/3 + 1/2 -= 5/6. For the ATC, the same is true with the treated and control labels +matched to it, the control unit in question would get a weight of \eqn{1/3 + 1/2 = 5/6}. For the ATC, the same is true with the treated and control labels switched. The weights are computed using the \code{match.matrix} component of the \code{matchit()} output object. } @@ -427,7 +416,6 @@ s.out1 <- matchit(treat ~ age + educ + race + nodegree + discard = "control", subclass = 10) s.out1 summary(s.out1, un = TRUE) - } \references{ Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching diff --git a/vignettes/assessing-balance.Rmd b/vignettes/assessing-balance.Rmd index e7e74a22..dd4358d8 100644 --- a/vignettes/assessing-balance.Rmd +++ b/vignettes/assessing-balance.Rmd @@ -281,7 +281,7 @@ love.plot(m.out, stats = c("m", "ks"), poly = 2, abs = TRUE, position = "bottom") ``` -The `love.plot()` documentation explains what each of these arguments do and the several other ones available. See `vignette("cobalt_A4_love.plot", package = "cobalt")` for other advanced customization of `love.plot()`. +The `love.plot()` documentation explains what each of these arguments do and the several other ones available. See `vignette("love.plot", package = "cobalt")` for other advanced customization of `love.plot()`. ### `bal.plot()` diff --git a/vignettes/matching-methods.Rmd b/vignettes/matching-methods.Rmd index 021649b5..2d667937 100644 --- a/vignettes/matching-methods.Rmd +++ b/vignettes/matching-methods.Rmd @@ -32,7 +32,7 @@ Matching is nonparametric in the sense that the estimated weights and pruning of It is important to note that this implementation of matching differs from the methods described by Abadie and Imbens [-@abadie2006; -@abadie2016] and implemented in the `Matching` R package and `teffects` routine in Stata. That form of matching is *matching imputation*, where the missing potential outcomes for each unit are imputed using the observed outcomes of paired units. This is a critical distinction because matching imputation is a specific estimation method with its own effect and standard error estimators, in contrast to subset selection, which is a preprocessing method that does not require specific estimators and is broadly compatible with other parametric and nonparametric analyses. The benefits of matching imputation are that its theoretical properties (i.e., the rate of convergence and asymptotic variance of the estimator) are well understood, it can be used in a straightforward way to estimate not just the average treatment effect in the treated (ATT) but also the average treatment effect in the population (ATE), and additional effective matching methods can be used in the imputation (e.g., kernel matching). The benefits of matching as nonparametric preprocessing are that it is far more flexible with respect to the types of effects that can be estimated because it does not involve any specific estimator, its empirical and finite-sample performance has been examined in depth and is generally well understood, and it aligns well with the design of experiments, which are more familiar to non-technical audiences. -In addition to subset selection, matching often (though not always) involves a form of *stratification*, the assignment of units to pairs or strata containing multiple units. The distinction between subset selection and stratification is described by @zubizarreta2014, who separate them into two separate steps. In `MatchIt`, with almost all matching methods, subset selection is performed by stratification; for example, treated units are paired with control units, and unpaired units are then dropped from the matched sample. With some methods, subclasses are used to assign matching or stratification weights to individual units, which increase or decrease each unit's leverage in a subsequent analysis. There has been some debate about the importance of stratification after subset selection; while some authors have argued that, with some forms of matching, pair membership is incidental [@stuart2008; @schafer2008], others have argued that correctly incorporating pair membership into effect estimation can improve the quality of inferences [@austin2014a; @wan2019]. For methods that allow it, `MatchIt` includes stratum membership as an additional output of each matching specification. How these strata can be used is detailed in `vignette("Estimating Effects")`. +In addition to subset selection, matching often (though not always) involves a form of *stratification*, the assignment of units to pairs or strata containing multiple units. The distinction between subset selection and stratification is described by @zubizarreta2014, who separate them into two separate steps. In `MatchIt`, with almost all matching methods, subset selection is performed by stratification; for example, treated units are paired with control units, and unpaired units are then dropped from the matched sample. With some methods, subclasses are used to assign matching or stratification weights to individual units, which increase or decrease each unit's leverage in a subsequent analysis. There has been some debate about the importance of stratification after subset selection; while some authors have argued that, with some forms of matching, pair membership is incidental [@stuart2008; @schafer2008], others have argued that correctly incorporating pair membership into effect estimation can improve the quality of inferences [@austin2014a; @wan2019]. For methods that allow it, `MatchIt` includes stratum membership as an additional output of each matching specification. How these strata can be used is detailed in `vignette("estimating-effects")`. At the heart of `MatchIt` are three classes of methods: distance matching, stratum matching, and pure subset selection. *Distance matching* involves considering a focal group (usually the treated group) and selecting members of the non-focal group (i.e., the control group) to pair with each member of the focal group based on the *distance* between units, which can be computed in one of several ways. Members of either group that are not paired are dropped from the sample. Nearest neighbor matching (`method = "nearest"`), optimal pair matching (`method = "optimal"`), optimal full matching (`method = "full"`), generalized full matching (`method = "quick"`), and genetic matching (`method = "genetic"`) are the methods of distance matching implemented in `MatchIt`. Typically, only the average treatment in the treated (ATT) or average treatment in the control (ATC), if the control group is the focal group, can be estimated after distance matching in `MatchIt` (full matching is an exception, described later). diff --git a/vignettes/sampling-weights.Rmd b/vignettes/sampling-weights.Rmd index 6a608af9..0d5bbbd5 100644 --- a/vignettes/sampling-weights.Rmd +++ b/vignettes/sampling-weights.Rmd @@ -146,7 +146,7 @@ Note that had we not added sampling weights to `mF`, the matching specification ## Estimating the Effect -Estimating the treatment effect after matching is straightforward when using sampling weights. Effects are estimated in the same way as when sampling weights are excluded, except that the matching weights must be multiplied by the sampling weights to yield accurate, generalizable estimates. `match.data()` and `get_matches()` do this automatically, so the weights produced by these functions already are a product of the matching weights and the sampling weights. Note this will only be true if sampling weights are incorporated into the `matchit` object. +Estimating the treatment effect after matching is straightforward when using sampling weights. Effects are estimated in the same way as when sampling weights are excluded, except that the matching weights must be multiplied by the sampling weights for use in the outcome model to yield accurate, generalizable estimates. `match.data()` and `get_matches()` do this automatically, so the weights produced by these functions already are a product of the matching weights and the sampling weights. Note this will only be true if sampling weights are incorporated into the `matchit` object. With `avg_comparisons()`, only the sampling weights should be included when estimating the treatment effect. Below we estimate the effect of `A` on `Y_C` in the matched and sampling weighted sample, adjusting for the covariates to improve precision and decrease bias. @@ -165,7 +165,7 @@ avg_comparisons(fit, wts = "SW") ``` -Note that `match.data()` and `get_weights()` have the option `include.s.weights`, which, when set to `FALSE`, makes it so the returned weights do not incorporate the sampling weights and are simply the matching weights. Because one might to forget to multiply the two sets of weights together, it is easier to just use the default of `include.s.weights = TRUE` and ignore the sampling weights in the rest of the analysis (because they are already included in the returned weights). `avg_comparisons()` also works more smoothly when the weights supplied to `weights` is a single variable rather than the product of two. +Note that `match.data()` and `get_weights()` have the option `include.s.weights`, which, when set to `FALSE`, makes it so the returned weights do not incorporate the sampling weights and are simply the matching weights. Because one might to forget to multiply the two sets of weights together, it is easier to just use the default of `include.s.weights = TRUE` and ignore the sampling weights in the rest of the analysis (because they are already included in the returned weights). ## Code to Generate Data used in Examples