Abstract
We compared four propensity score (PS) methods using simulations: maximum likelihood (ML), generalized boosting models (GBM), covariate balancing propensity scores (CBPS), and generalized additive models (GAM). Although these methods have been shown to perform better than the ML in estimating causal treatment effects, no comparison has been conducted in terms of type I error and power, and the impact of treatment exposure prevalence on PS methods has not been studied. In order to fill these gaps, we considered four simulation scenarios differing by the complexity of a propensity score model and a range of exposure prevalence. Propensity score weights were estimated using the ML, CBPS and GAM of logistic regression and the GBM. We used these propensity weights to estimate the average treatment effect among treated on a binary outcome. Simulations showed that (1) the CBPS was generally superior across the four scenarios studied in terms of type I error, power and mean squared error; (2) the GBM and the GAM were less biased than the CBPS and the ML under complex models; (3) the ML performed well when treatment exposure is rare.











Similar content being viewed by others
References
Austin PC (2009) Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med 28(25):3083–3107. https://doi.org/10.1002/sim.3697
Austin PC (2011) An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res 46(3:SI):399–424. https://doi.org/10.1080/00273171.2011.568786
Brookhart M, Schneeweiss S, Rothman K, Glynn R, Avorn J, Sturmer T (2006) Variable selection for propensity score models. Am J Epidemiol 163(12):1149–1156. https://doi.org/10.1093/aje/kwj149
Brookhart MA, Wyss R, Layton JB, Stuerner T (2013) Propensity score methods for confounding control in nonexperimental research. Circ Cardiovasc Qual Outcomes 6(5):604–611. https://doi.org/10.1161/CIRCOUTCOMES.113.000359
Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232. https://doi.org/10.1214/aos/1013203451
Hastie T (2016) gam: generalized additive models. https://CRAN.R-project.org/package=gam, R package version 1.14
Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall, London
Imai K, Ratkovic M (2014) Covariate balancing propensity score. J R Stat Soc Ser B Stat Methodol 76(1):243–263. https://doi.org/10.1111/rssb.12027
Koch B, Vock D, Wolfson J (2017) Covariate selection with group lasso and doubly robust estimation of causal effects. Biometrics 74(1):8–17
Lee BK, Lessler J, Stuart EA (2010) Improving propensity score weighting using machine learning. Stat Med 29(3):337–346. https://doi.org/10.1002/sim.3782
Lee BK, Lessler J, Stuart EA (2011) Weight trimming and propensity score weighting. PLoS ONE 6(3):e18174. https://doi.org/10.1371/journal.pone.0018174
Lumley T (2017) Survey: analysis of complex survey samples. R package version 3.32
McCaffrey D, Ridgeway G, Morral A (2004) Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods 9(4):403–425. https://doi.org/10.1037/1082-989X.9.4.403
Pirracchio R, Carone M (2016) The Balance Super Learner: a robust adaptation of the Super Learner to improve estimation of the average treatment effect in the treated based on propensity score matching. Stat Methods Med Res 27(8):2504–2518
Pirracchio R, Petersen ML, van der Laan M (2015) Improving propensity score estimators’ robustness to model misspecification using super learner. Am J Epidemiol 181(2):108+. https://doi.org/10.1093/aje/kwu253
Ridgeway G (2017) gbm: generalized boosted regression models. https://CRAN.R-project.org/package=gbm, R package version 2.1.3
Rosenbaum P, Rubin D (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55. https://doi.org/10.1093/biomet/70.1.41
Setodji CM, McCaffrey DE, Burgette LF, Almirall D, Griffin BA (2017) The right tool for the job: choosing between covariate-balancing and generalized boosted model propensity scores. Epidemiology 28(6):802–811. https://doi.org/10.1097/EDE.0000000000000734
Setoguchi S, Schneeweiss S, Brookhart MA, Glynn RJ, Cook EF (2008) Evaluating uses of data mining techniques in propensity score estimation: a simulation study. Pharmacoepidemiol Drug Saf 17(6):546–555. https://doi.org/10.1002/pds.1555
Shortreed SM, Ertefaie A (2017) Outcome-adaptive lasso: variable selection for causal inference. Biometrics 73(4):1111–1122. https://doi.org/10.1111/biom.12679
Westreich D, Cole SR, Funk MJ, Brookhart MA, Stuermer T (2011) The role of the c-statistic in variable selection for propensity score models. Pharmacoepidemiol Drug Saf 20(3):317–320. https://doi.org/10.1002/pds.2074
Woo MJ, Reiter JP, Karr AF (2008) Estimation of propensity scores using generalized additive models. Stat Med 27(19):3805–3816. https://doi.org/10.1002/sim.3278
Wyss R, Ellis AR, Brookhart MA, Girman CJ, Funk MJ, LoCasale R, Stuermer T (2014) The role of prediction modeling in propensity score estimation: an evaluation of logistic regression, bCART, and the covariate-balancing propensity score. Am J Epidemiol 180(6):645–655. https://doi.org/10.1093/aje/kwu181
Acknowledgements
This work was supported in part by the National Cancer Institute for the Cancer Therapy and Research Center (P30CA054174) at the UT Health Science Center at San Antonio.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Choi, B.Y., Wang, CP., Michalek, J. et al. Power comparison for propensity score methods. Comput Stat 34, 743–761 (2019). https://doi.org/10.1007/s00180-018-0852-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-018-0852-5