[1908.04748] Optimal Estimation of Generalized Average Treatment Effects using Kernel Optimal Matching

9 weeks ago by cshalizi

"In causal inference, a variety of causal effect estimands have been studied, including the sample, uncensored, target, conditional, optimal subpopulation, and optimal weighted average treatment effects. Ad-hoc methods have been developed for each estimand based on inverse probability weighting (IPW) and on outcome regression modeling, but these may be sensitive to model misspecification, practical violations of positivity, or both. The contribution of this paper is twofold. First, we formulate the generalized average treatment effect (GATE) to unify these causal estimands as well as their IPW estimates. Second, we develop a method based on Kernel Optimal Matching (KOM) to optimally estimate GATE and to find the GATE most easily estimable by KOM, which we term the Kernel Optimal Weighted Average Treatment Effect. KOM provides uniform control on the conditional mean squared error of a weighted estimator over a class of models while simultaneously controlling for precision. We study its theoretical properties and evaluate its comparative performance in a simulation study. We illustrate the use of KOM for GATE estimation in two case studies: comparing spine surgical interventions and studying the effect of peer support on people living with HIV."

to:NB
causal_inference
estimation
matching
statistics
kernel_estimators
9 weeks ago by cshalizi

[1907.12709] Propensity score analysis with latent covariates: Measurement error bias correction using the covariate's posterior mean, aka the inclusive factor score

11 weeks ago by cshalizi

"We address measurement error bias in propensity score (PS) analysis due to covariates that are latent variables. In the setting where latent covariate X is measured via multiple error-prone items W, PS analysis using several proxies for X -- the W items themselves, a summary score (mean/sum of the items), or the conventional factor score (cFS , i.e., predicted value of X based on the measurement model) -- often results in biased estimation of the causal effect, because balancing the proxy (between exposure conditions) does not balance X. We propose an improved proxy: the conditional mean of X given the combination of W, the observed covariates Z, and exposure A, denoted XWZA. The theoretical support, which applies whether X is latent or not (but is unobserved), is that balancing XWZA (e.g., via weighting or matching) implies balancing the mean of X. For a latent X, we estimate XWZA by the inclusive factor score (iFS) -- predicted value of X from a structural equation model that captures the joint distribution of (X,W,A) given Z. Simulation shows that PS analysis using the iFS substantially improves balance on the first five moments of X and reduces bias in the estimated causal effect. Hence, within the proxy variables approach, we recommend this proxy over existing ones. We connect this proxy method to known results about weighting/matching functions (Lockwood & McCaffrey, 2016; McCaffrey, Lockwood, & Setodji, 2013). We illustrate the method in handling latent covariates when estimating the effect of out-of-school suspension on risk of later police arrests using Add Health data."

in_NB
matching
factor_analysis
causal_inference
statistics
stuart.elizabeth
11 weeks ago by cshalizi

[1806.06802] Almost-Exact Matching with Replacement for Causal Inference

june 2019 by cshalizi

"We aim to create the highest possible quality of treatment-control matches for categorical data in the potential outcomes framework. Matching methods are heavily used in the social sciences due to their interpretability, but most matching methods do not pass basic sanity checks: they fail when irrelevant variables are introduced, and tend to be either computationally slow or produce low-quality matches. The method proposed in this work aims to match units on a weighted Hamming distance, taking into account the relative importance of the covariates; the algorithm aims to match units on as many relevant variables as possible. To do this, the algorithm creates a hierarchy of covariate combinations on which to match (similar to downward closure), in the process solving an optimization problem for each unit in order to construct the optimal matches. The algorithm uses a single dynamic program to solve all of the optimization problems simultaneously. Notable advantages of our method over existing matching procedures are its high-quality matches, versatility in handling different data distributions that may have irrelevant variables, and ability to handle missing data by matching on as many available covariates as possible."

to:NB
causal_inference
matching
statistics
computational_statistics
june 2019 by cshalizi

[1905.12020] Matching on What Matters: A Pseudo-Metric Learning Approach to Matching Estimation in High Dimensions

may 2019 by cshalizi

"When pre-processing observational data via matching, we seek to approximate each unit with maximally similar peers that had an alternative treatment status--essentially replicating a randomized block design. However, as one considers a growing number of continuous features, a curse of dimensionality applies making asymptotically valid inference impossible (Abadie and Imbens, 2006). The alternative of ignoring plausibly relevant features is certainly no better, and the resulting trade-off substantially limits the application of matching methods to "wide" datasets. Instead, Li and Fu (2017) recasts the problem of matching in a metric learning framework that maps features to a low-dimensional space that facilitates "closer matches" while still capturing important aspects of unit-level heterogeneity. However, that method lacks key theoretical guarantees and can produce inconsistent estimates in cases of heterogeneous treatment effects. Motivated by straightforward extension of existing results in the matching literature, we present alternative techniques that learn latent matching features through either MLPs or through siamese neural networks trained on a carefully selected loss function. We benchmark the resulting alternative methods in simulations as well as against two experimental data sets--including the canonical NSW worker training program data set--and find superior performance of the neural-net-based methods."

to:NB
matching
causal_inference
statistics
metric_learning
may 2019 by cshalizi

Large Sample Properties of Matching Estimators for Average Treatment Effects - Abadie - 2005 - Econometrica - Wiley Online Library

february 2016 by cshalizi

"Matching estimators for average treatment effects are widely used in evaluation research despite the fact that their large sample properties have not been established in many cases. The absence of formal results in this area may be partly due to the fact that standard asymptotic expansions do not apply to matching estimators with a fixed number of matches because such estimators are highly nonsmooth functionals of the data. In this article we develop new methods for analyzing the large sample properties of matching estimators and establish a number of new results. We focus on matching with replacement with a fixed number of matches. First, we show that matching estimators are not N1/2-consistent in general and describe conditions under which matching estimators do attain N1/2-consistency. Second, we show that even in settings where matching estimators are N1/2-consistent, simple matching estimators with a fixed number of matches do not attain the semiparametric efficiency bound. Third, we provide a consistent estimator for the large sample variance that does not require consistent nonparametric estimation of unknown functions. Software for implementing these methods is available in Matlab, Stata, and R."

--- An unkind version of this would be "matching is what happens when you do nearest-neighbor regression, and you forget that the bias-variance tradeoff is a _tradeoff_."

(Ungated version: http://www.ksg.harvard.edu/fs/aabadie/smep.pdf)

(ADA note: reference in the causal-estimation chapter, re connection between matching and nearest neighbors)

to:NB
statistics
estimation
causal_inference
regression
to_teach:undergrad-ADA
have_read
matching
--- An unkind version of this would be "matching is what happens when you do nearest-neighbor regression, and you forget that the bias-variance tradeoff is a _tradeoff_."

(Ungated version: http://www.ksg.harvard.edu/fs/aabadie/smep.pdf)

(ADA note: reference in the causal-estimation chapter, re connection between matching and nearest neighbors)

february 2016 by cshalizi

Why Propensity Scores Should Not Be Used for Matching | Gary King

july 2015 by cshalizi

"Researchers use propensity score matching (PSM) as a data preprocessing step to selectively prune units prior to applying a model to estimate a causal effect. The goal of PSM is to reduce imbalance in the chosen pre-treatment covariates between the treated and control groups, thereby reducing the degree of model dependence and potential for bias. We show here that PSM often accomplishes the opposite of what is intended -- increasing imbalance, inefficiency, model dependence, and bias. The weakness of PSM is that it attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more powerful fully blocked randomized experiment. PSM, unlike other matching methods, is thus blind to the often large portion of imbalance that could have been eliminated by approximating full blocking. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which turns out to increase imbalance. For other matching methods, the point where additional pruning increases imbalance occurs much later in the pruning process, when full blocking is approximated and there is no reason to prune, and so the danger is considerably less. We show that these problems with PSM occur even in data designed for PSM, with as few as two covariates, and in many real applications. Although these results suggest that researchers replace PSM with one of the other available methods when performing matching, propensity scores have many other productive uses."

--- The point about the simple randomized experiment vs. blocking is a good one.

--- An alternative to matching I've never seen explored (which probably means it's obviously flawed) would be to just do a two-variable non-parametric regression with the treatment variable and the propensity score...

in_NB
have_read
causal_inference
matching
statistics
king.gary
experimental_design
--- The point about the simple randomized experiment vs. blocking is a good one.

--- An alternative to matching I've never seen explored (which probably means it's obviously flawed) would be to just do a two-variable non-parametric regression with the treatment variable and the propensity score...

july 2015 by cshalizi

Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies

july 2013 by cshalizi

"This paper presents genetic matching, a method of multivariate matching that uses an evolutionary search algorithm to determine the weight each covariate is given. Both propensity score matching and matching based on Mahalanobis distance are limiting cases of this method. The algorithm makes transparent certain issues that all matching methods must confront. We present simulation studies that show that the algorithm improves covariate balance and that it may reduce bias if the selection on observables assumption holds. We then present a reanalysis of a number of data sets in the LaLonde (1986) controversy."

to:NB
causal_inference
genetic_algorithms
statistics
matching
july 2013 by cshalizi

Estimating the Effect of Premarital Cohabitation on Timing of Marital Disruption

august 2012 by cshalizi

"In this article, we extend the propensity score method by matching on multiple groups. Using data from first wave (1987–1988) and third wave (2001–2003) of National Survey of Families and Households (NSFH), we match married individuals with no premarital cohabitation, single premarital cohabitation with the spouse, and serial premarital cohabitations, and apply Cox proportional hazards models to explore how premarital cohabitation history affects marital disruption. Our results indicate that both selection and causation help explain the relationship. The selection effect played a large role in 1987–88 when cohabitation was uncommon but disappeared in 2001–03 when cohabitation became prevalent. Postmatching results demonstrate that the causal effect of cohabitation on marital disruption was strong among serial cohabitors and weak among one-time cohabitors with the spouse. The imputation-based sensitivity analysis shows that our conclusion is robust even with the presence of unobserved characteristics that have a moderate association with cohabitation and marital disruption."

- Case study/exam problem for uADA?

causal_inference
statistics
survival_analysis
living_in_sin
to_teach:undergrad-ADA
matching
- Case study/exam problem for uADA?

august 2012 by cshalizi

CRAN - Package MatchIt

february 2011 by cshalizi

"MatchIt preprocesses data by selecting approximate matched samples of the treated and control groups with similar covariate distributions, drawing on a large variety of matching methods. After preprocessing data with MatchIt, whatever standard parametric technique one might have used without preprocessing can be used, but the results will be far less model dependent."

I want to teach _some_ matching methods in 402, but I definitely don't want the kids to program them. This might work...

matching
causal_inference
statistics
to_teach:undergrad-ADA
I want to teach _some_ matching methods in 402, but I definitely don't want the kids to program them. This might work...

february 2011 by cshalizi

Social Science Statistics Blog: Can matching solve endogeneity?

october 2010 by cshalizi

" people who like matching methods ... tend to believe that most confounders can be measured ... and that there aren't a lot of lurking unobservables. ... [P]eople ... who are skeptical of matching ... argue that there will always be problematic unobservables lurking ... [and they] prefer instrumental variables approaches .... [T]he same people who tell me that lurking unobservables are everywhere tend to be fairly comfortable making the ... exclusion restrictions that make IV approaches work. The crazy thing is that just like matching, these assumptions [are] about unobservable causal pathways. The claim that an instrumental variable is valid is the claim that there are no unobserved (or observed) variables linking the instrument to the outcome except through the path of the instrumented variable. ... [P]eople who think that lurking unobservables are everywhere in matching somehow think that all these lurking uobservables go away as soon as you call something an instrument..."

causal_inference
instrumental_variables
matching
to:blog
october 2010 by cshalizi

A Cautionary Note on the Use of Matching to Estimate Causal Effects: An Empirical Example Comparing Matching Estimates to an Experimental Benchmark — Sociological Methods Research

october 2010 by cshalizi

"...social scientists have increasingly turned to matching [to draw] causal inferences from observational data. Matching compares those who receive a treatment to those with similar background attributes who do not receive a treatment. ... Drawing on a randomized voter mobilization experiment ... compare matching [estimates] to an experimental benchmark. ... enormous sample size .... exactly match each treated subject to 40 untreated subjects. Matching greatly exaggerates the effectiveness of pre-election phone calls encouraging voter participation. ... Matching suggests that another pre-election phone call that encouraged people to wear their seat belts also generated huge increases in voter turnout. ... caution is warranted when applying matching estimators to observational data, particularly when one is uncertain about the potential for biased inference." Ouch!

have_read
to_teach:data-mining
causal_inference
matching
experimental_political_science
evisceration
to:blog
to_teach:undergrad-ADA
october 2010 by cshalizi

**related tags**

Copy this bookmark: