4.4 Article

Causal Inference with Multilevel Data: A Comparison of Different Propensity Score Weighting Approaches

Journal

MULTIVARIATE BEHAVIORAL RESEARCH
Volume 57, Issue 6, Pages 916-939

Publisher

ROUTLEDGE JOURNALS, TAYLOR & FRANCIS LTD
DOI: 10.1080/00273171.2021.1925521

Keywords

Causal inference; propensity scores; multilevel data; weighting; calibration weights

Ask authors/readers for more resources

Propensity score methods are widely recommended to adjust for confounding and recover treatment effects. This article reviews propensity score weighting estimators for multilevel data and shows that estimates based on calibration weights should be preferred under many scenarios. Large cluster sizes are needed for accurate estimates of treatment effect when covariate effects vary strongly across clusters.
Propensity score methods are a widely recommended approach to adjust for confounding and to recover treatment effects with non-experimental, single-level data. This article reviews propensity score weighting estimators for multilevel data in which individuals (level 1) are nested in clusters (level 2) and nonrandomly assigned to either a treatment or control condition at level 1. We address the choice of a weighting strategy (inverse probability weights, trimming, overlap weights, calibration weights) and discuss key issues related to the specification of the propensity score model (fixed-effects model, multilevel random-effects model) in the context of multilevel data. In three simulation studies, we show that estimates based on calibration weights, which prioritize balancing the sample distribution of level-1 and (unmeasured) level-2 covariates, should be preferred under many scenarios (i.e., treatment effect heterogeneity, presence of strong level-2 confounding) and can accommodate covariate-by-cluster interactions. However, when level-1 covariate effects vary strongly across clusters (i.e., under random slopes), and this variation is present in both the treatment and outcome data-generating mechanisms, large cluster sizes are needed to obtain accurate estimates of the treatment effect. We also discuss the implementation of survey weights and present a real-data example that illustrates the different methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available