ABSTRACT
In causal inference, it is common to select a subset of observed covariates, named the adjustment features, to be adjusted for estimating the treatment effect. For real-world applications, the abundant covariates are usually observed, which contain extra variables partially correlating to the treatment (treatment-only variables, e.g., instrumental variables) or the outcome (outcome-only variables, e.g., precision variables) besides the confounders (variables that affect both the treatment and outcome). In principle, unbiased treatment effect estimation is achieved once the adjustment features contain all the confounders. However, the performance of empirical estimations varies a lot with different extra variables. To solve this issue, variable separation/selection for treatment effect estimation has received growing attention when the extra variables contain instrumental variables and precision variables.
In this paper, assuming no mediator variables exist, we consider a more general setting by allowing for the existence of post-treatment and post-outcome variables rather than instrumental and precision variables in observed covariates. Our target is to separate the treatment-only variables from the adjustment features. To this end, we establish a metric named Optimal Adjustment Features(OAF), which empirically measures the asymptotic variance of the estimation. Theoretically, we show that our OAF metric is minimized if and only if adjustment features consist of the confounders and outcome-only variables, i.e., the treatment-only variables are perfectly separated. As optimizing the OAF metric is a combinatorial optimization problem, we introduce Reinforcement Learning (RL) and adopt the policy gradient to search for the optimal adjustment set. Empirical results on both synthetic and real-world datasets demonstrate that (a) our method successfully searches the optimal adjustment features and (b) the searched adjustment features achieve a more precise estimation of the treatment effect.
Supplemental Material
- Douglas Almond, Kenneth Y Chay, and David S Lee. 2005. The costs of low birth weight. The Quarterly Journal of Economics, Vol. 120, 3 (2005), 1031--1083.Google Scholar
- Susan Athey, Guido W Imbens, and Stefan Wager. 2018. Approximate residual balancing: debiased inference of average treatment effects in high dimensions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 80, 4 (2018), 597--623.Google ScholarCross Ref
- Peter C Austin and Elizabeth A Stuart. 2015. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in medicine, Vol. 34, 28 (2015), 3661--3679.Google Scholar
- Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio. 2016. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940 (2016).Google Scholar
- CM Booth and IF Tannock. 2014. Randomised controlled trials and population-based observational research: partners in the evolution of medical evidence. British journal of cancer, Vol. 110, 3 (2014), 551--555.Google Scholar
- William G Cochran. 1968. The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics (1968), 295--313.Google Scholar
- Carlos Fernández-Loría and Foster Provost. 2022. Causal decision making and causal effect estimation are not the same? and why it matters. INFORMS Journal on Data Science (2022).Google Scholar
- Jinyong Hahn. 1998. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica (1998), 315--331.Google Scholar
- Negar Hassanpour and Russell Greiner. 2019. Learning disentangled representations for counterfactual regression. In International Conference on Learning Representations.Google Scholar
- Tobias Hatt and Stefan Feuerriegel. 2021. Estimating average treatment effects via orthogonal regularization. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 680--689.Google ScholarDigital Library
- Jennifer L Hill. 2011. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, Vol. 20, 1 (2011), 217--240.Google ScholarCross Ref
- Oliver Hines, Oliver Dukes, Karla Diaz-Ordaz, and Stijn Vansteelandt. 2022. Demystifying statistical learning based on efficient influence functions. The American Statistician (2022), 1--13.Google Scholar
- Guido W Imbens and Donald B Rubin. 2015. Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.Google ScholarDigital Library
- Amir-Hossein Karimi, Julius Von Kügelgen, Bernhard Schölkopf, and Isabel Valera. 2020. Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. Advances in neural information processing systems, Vol. 33 (2020), 265--277.Google Scholar
- Kun Kuang, Peng Cui, Hao Zou, Bo Li, Jianrong Tao, Fei Wu, and Shiqiang Yang. 2020. Data-driven variable decomposition for treatment effect estimation. IEEE Transactions on Knowledge and Data Engineering (2020).Google Scholar
- Bryan Lim. 2018. Forecasting treatment responses over time using recurrent marginal structural networks. advances in neural information processing systems, Vol. 31 (2018).Google Scholar
- Safoora Masoumi and Saeid Shahraz. 2022. Meta-analysis using Python: a hands-on tutorial. BMC medical research methodology, Vol. 22, 1 (2022), 1--8.Google Scholar
- Judea Pearl et al. 2000. Models, reasoning and inference. Cambridge, UK: CambridgeUniversityPress, Vol. 19, 2 (2000).Google Scholar
- Zhaozhi Qian, Alicia Curth, and Mihaela van der Schaar. 2021. Estimating Multi-cause Treatment Effects via Single-cause Perturbation. Advances in Neural Information Processing Systems, Vol. 34 (2021), 23754--23767.Google Scholar
- Andrea Rotnitzky and Ezequiel Smucler. 2020. Efficient Adjustment Sets for Population Average Causal Treatment Effect Estimation in Graphical Models. J. Mach. Learn. Res., Vol. 21, 188 (2020), 1--86.Google Scholar
- Uri Shalit, Fredrik D Johansson, and David Sontag. 2017. Estimating individual treatment effect: generalization bounds and algorithms. In International Conference on Machine Learning. PMLR, 3076--3085.Google Scholar
- Claudia Shi, David Blei, and Victor Veitch. 2019. Adapting neural networks for the estimation of treatment effects. Advances in neural information processing systems, Vol. 32 (2019).Google Scholar
- Claudia Shi, Victor Veitch, and David M Blei. 2021. Invariant representation learning for treatment effect estimation. In Uncertainty in Artificial Intelligence. PMLR, 1546--1555.Google Scholar
- Leonard A Stefanski and Dennis D Boos. 2002. The calculus of M-estimation. The American Statistician, Vol. 56, 1 (2002), 29--38.Google ScholarCross Ref
- Elizabeth A Stuart. 2010. Matching methods for causal inference: A review and a look forward. Statistical science: a review journal of the Institute of Mathematical Statistics, Vol. 25, 1 (2010), 1.Google Scholar
- Stratis Tsirtsis and Manuel Gomez Rodriguez. 2020. Decisions, counterfactual explanations and strategic behavior. Advances in Neural Information Processing Systems, Vol. 33 (2020), 16749--16760.Google Scholar
- Mark J Van der Laan, Sherri Rose, et al. 2011. Targeted learning: causal inference for observational and experimental data. Vol. 10. Springer.Google Scholar
- Mark J Van Der Laan and Daniel Rubin. 2006. Targeted maximum likelihood learning. The international journal of biostatistics, Vol. 2, 1 (2006).Google Scholar
- Stefan Wager and Susan Athey. 2018. Estimation and inference of heterogeneous treatment effects using random forests. J. Amer. Statist. Assoc., Vol. 113, 523 (2018), 1228--1242.Google ScholarCross Ref
- Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3 (1992), 229--256.Google ScholarDigital Library
- Anpeng Wu, Junkun Yuan, Kun Kuang, Bo Li, Runze Wu, Qiang Zhu, Yue Ting Zhuang, and Fei Wu. 2022. Learning decomposed representations for treatment effect estimation. IEEE Transactions on Knowledge and Data Engineering (2022).Google Scholar
- Pengzhou Wu and Kenji Fukumizu. 2021. β-Intact-VAE: Identifying and Estimating Causal Effects under Limited Overlap. arXiv preprint arXiv:2110.05225 (2021).Google Scholar
- Liuyi Yao, Sheng Li, Yaliang Li, Mengdi Huai, Jing Gao, and Aidong Zhang. 2018. Representation learning for treatment effect estimation from observational data. Advances in Neural Information Processing Systems, Vol. 31 (2018).Google Scholar
- Jinsung Yoon, James Jordon, and Mihaela Van Der Schaar. 2018. GANITE: Estimation of individualized treatment effects using generative adversarial nets. In International Conference on Learning Representations.Google Scholar
- Shengyu Zhang, Dong Yao, Zhou Zhao, Tat-Seng Chua, and Fei Wu. 2021b. Causerec: Counterfactual user sequence synthesis for sequential recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 367--377.Google ScholarDigital Library
- Weijia Zhang, Lin Liu, and Jiuyong Li. 2021a. Treatment effect estimation with disentangled latent factors. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 10923--10930.Google ScholarCross Ref
- Shengyu Zhu, Ignavier Ng, and Zhitang Chen. 2019. Causal discovery with reinforcement learning. arXiv preprint arXiv:1906.04477 (2019).Google Scholar
- Yueting Zhuang, Ming Cai, Xuelong Li, Xiangang Luo, Qiang Yang, and Fei Wu. 2020. The next breakthroughs of artificial intelligence: The interdisciplinary nature of AI. Engineering, Vol. 6, 3 (2020), 245.Google ScholarCross Ref
- Hao Zou, Bo Li, Jiangang Han, Shuiping Chen, Xuetao Ding, and Peng Cui. 2022. Counterfactual Prediction for Outcome-Oriented Treatments. In International Conference on Machine Learning. PMLR, 27693--27706.Google Scholar
Index Terms
- Treatment Effect Estimation with Adjustment Feature Selection
Recommendations
InfoCEVAE: treatment effect estimation with hidden confounding variables matching
AbstractTreatment effect estimation is a fundamental problem in various domains for effective decision making. While many studies assume that observational data include all the confounding variables, we cannot practically guarantee that observational data ...
Treatment Effect Estimation via Differentiated Confounder Balancing and Regression
Treatment effect plays an important role on decision making in many fields, such as social marketing, healthcare, and public policy. The key challenge on estimating treatment effect in the wild observational studies is to handle confounding bias induced ...
Treatment effect estimation with data-driven variable decomposition
AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial IntelligenceOne fundamental problem in causal inference is the treatment effect estimation in observational studies when variables are confounded. Control for confounding effect is generally handled by propensity score. But it treats all observed variables as ...
Comments