skip to main content
research-article

A Causal Dirichlet Mixture Model for Causal Inference from Observational Data

Published: 29 April 2020 Publication History

Abstract

Estimating causal effects by making causal inferences from observational data is common practice in scientific studies, business decision-making, and daily life. In today’s data-driven world, causal inference has become a key part of the evaluation process for many purposes, such as examining the effects of medicine or the impact of an economic policy on society. However, although the literature contains some excellent models, there is room to improve their representation power and their ability to capture complex relationships. For these reasons, we propose a novel prior called Causal DP and a model called CDP. The prior captures the complex relationships between covariates, treatments, and outcomes in observational data using a rational probabilistic dependency structure. The model is Bayesian, nonparametric, and generative and is not based on the assumption of any parametric distribution. CDP is designed to estimate various kinds of causal effects—average, conditional average, average treated, quantile, and so on. It performs well with missing covariates and does not suffer from overfitting. Comparative experiments on synthetic datasets against several state-of-the-art methods demonstrate that CDP has a superior ability to capture complex relationships. Further, a simple evaluation to infer the effect of a job training program on trainee earnings from real-world data shows that CDP is both effective and useful for causal inference.

References

[1]
Alberto Abadie and Guido W. Imbens. 2011. Bias-corrected matching estimators for average treatment effects. J. Bus. Econ. Stat. 29, 1 (2011), 1--11.
[2]
Sungjin Ahn, Babak Shahbaba, and Max Welling. 2014. Distributed stochastic gradient MCMC. In Proceedings of the International Conference on Machine Learning. 1044--1052.
[3]
Ahmed M. Alaa and Mihaela van der Schaar. 2017. Bayesian inference of individualized treatment effects using multi-task Gaussian processes. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 3424--3432.
[4]
Susan Athey and Guido Imbens. 2016. Recursive partitioning for heterogeneous causal effects. Proc. Nat. Acad. Sci. 113, 27 (2016), 7353--7360.
[5]
Susan Athey, Guido W. Imbens, and Stefan Wager. 2018. Approximate residual balancing: Debiased inference of average treatment effects in high dimensions. J. Roy. Stat. Soc.: Series B (Stat. Methodol.) 80, 4 (2018), 597--623.
[6]
Susan Athey, Julie Tibshirani, Stefan Wager, et al. 2019. Generalized random forests. Ann. Stat. 47, 2 (2019), 1148--1178.
[7]
Peter C. Austin. 2011. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar. Behav. Res. 46, 3 (2011), 399--424.
[8]
Heejung Bang and James M. Robins. 2005. Doubly robust estimation in missing data and causal inference models. Biometrics 61, 4 (2005), 962--973.
[9]
Alexandre Belloni, Victor Chernozhukov, Ivan Fernández-Val, and Christian Hansen. 2017. Program evaluation and causal inference with high-dimensional data. Econometrica 85, 1 (2017), 233--298.
[10]
David Blackwell and James B. MacQueen. 1973. Ferguson distributions via Pólya urn schemes. Ann. Stat. 1 (1973), 353--355.
[11]
Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, and Whitney Newey. 2017. Double/debiased/Neyman machine learning of treatment effects. Amer. Econ. Rev. 107, 5 (2017), 261--65.
[12]
Hugh A. Chipman, Edward I. George, Robert E. McCulloch, et al. 2010. BART: Bayesian additive regression trees. Ann. Stat. 4, 1 (2010), 266--298.
[13]
William G. Cochran. 1968. The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics (1968), 295--313.
[14]
Rajeev H. Dehejia and Sadek Wahba. 2002. Propensity score-matching methods for nonexperimental causal studies. Rev. Econ. Stat. 84, 1 (2002), 151--161.
[15]
Alexis Diamond and Jasjeet S. Sekhon. 2013. Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Rev. Econ. Stat. 95, 3 (2013), 932--945.
[16]
Nan Ding, Rongjing Xiang, Ian Molloy, Ninghui Li, et al. 2010. Nonparametric Bayesian matrix factorization by Power-EP. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS’10). 169--176.
[17]
Kjell Doksum. 1974. Tailfree and neutral random probabilities and their posterior distributions. Ann. Prob. 2, 2 (1974), 183--201.
[18]
Finale Doshi-Velez, David Pfau, Frank D. Wood, and Nicholas Roy. 2015. Bayesian nonparametric methods for partially observable reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2 (2015), 394--407.
[19]
Michael D. Escobar and Mike West. 1995. Bayesian density estimation and inference using mixtures. J. Amer. Stat. Assoc. 90, 430 (1995), 577--588.
[20]
Ali Faisal, Jussi Gillberg, Gayle Leen, and Jaakko Peltonen. 2013. Transfer learning using a nonparametric sparse topic model. Neurocomputing 112 (2013), 124--137.
[21]
Thomas S. Ferguson. 1973. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 2 (1973), 209--230.
[22]
Seth R. Flaxman, Daniel B. Neill, and Alexander J. Smola. 2015. Gaussian processes for independence tests with non-iid data in causal inference. ACM Trans. Intell. Syst. Technol. 7, 2, Article 22 (Nov. 2015), 23 pages.
[23]
Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau, Yu-Sung Su, et al. 2008. A weakly informative default prior distribution for logistic and other regression models. Ann. Stat. 2, 4 (2008), 1360--1383.
[24]
Debashis Ghosh, Yeying Zhu, and Donna L. Coffman. 2015. Penalized regression procedures for variable selection in the potential outcomes framework. Stat. Med. 34, 10 (2015), 1645--1658.
[25]
Lauren A. Hannah, David M. Blei, and Warren B. Powell. 2011. Dirichlet process mixtures of generalized linear models. J. Mach. Learn. Res. 12, June (2011), 1923--1953.
[26]
Ben B. Hansen. 2008. The prognostic analogue of the propensity score. Biometrika 95, 2 (2008), 481--488.
[27]
Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy. 2017. Deep IV: A flexible approach for counterfactual prediction. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 1414--1423.
[28]
Jennifer L. Hill. 2011. Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Stat. 20, 1 (2011), 217--240.
[29]
Paul W. Holland. 1986. Statistics and causal inference. J. Amer. Stat. Assoc. 81, 396 (1986), 945--960.
[30]
Guido W. Imbens and Donald B. Rubin. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.
[31]
Fredrik Johansson, Uri Shalit, and David Sontag. 2016. Learning representations for counterfactual inference. In Proceedings of the 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research), Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, New York, New York, 3020--3029.
[32]
Michael I. Jordan. 2010. Bayesian nonparametric learning: Expressive priors for intelligent systems. Heur. Probab. Causal. Trib. Judea Pearl 11 (2010), 167--185.
[33]
Nathan Kallus. 2016. Generalized Optimal Matching Methods for Causal Inference. arXiv:arXiv:1612.08321
[34]
Nathan Kallus. 2017. A framework for optimal matching for causal inference. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. 372--381.
[35]
Nathan Kallus. 2017. Recursive partitioning for personalization using observational data. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 1789--1798.
[36]
Emre Kiciman and Jorgen Thelin. 2018. Answering what if, should I, and other expectation exploration queries using causal inference over longitudinal data. In Proceedings of the Design of Experimental Search and Information Retrieval Systems (DESIRES’18).
[37]
Noemi Kreif and Karla Diaz Ordaz. (2019). Machine learning in policy evaluation: New tools for causal inference. arXiv:stat.ML/http://arxiv.org/abs/1903.00402v1.
[38]
Kun Kuang, Meng Jiang, Peng Cui, Hengliang Luo, and Shiqiang Yang. 2018. Effective promotional strategies selection in social media: A data-driven approach. IEEE Trans. Big Data 4, 4 (2018), 487--501.
[39]
Sören R. Künzel, Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. 2019. Metalearners for estimating heterogeneous treatment effects using machine learning. Proc. Nat. Acad. Sci. 116, 10 (2019), 4156--4165.
[40]
Robert J. LaLonde. 1986. Evaluating the econometric evaluations of training programs with experimental data. Amer. Econ. Rev. 76, 4 (1986), 604--620.
[41]
Edwin Leuven and Barbara Sianesi. 2018. PSMATCH2: Stata module to perform full Mahalanobis and propensity score matching, common support graphing, and covariate imbalance testing. (2018). https://EconPapers.repec.org/RePEc:boc:bocode:s432001.
[42]
Jiuyong Li, Thuc Duy Le, Lin Liu, Jixue Liu, Zhou Jin, Bing-Yu Sun, and Saisai Ma. 2016. From observational studies to causal rule mining. ACM Trans. Intell. Syst. Technol. 7, 2 (2016), 14:1--14:27.
[43]
Jiuyong Li, Jixue Liu, Lin Liu, Thuc Duy Le, Saisai Ma, and Yizhao Han. 2017. Discrimination detection by causal effect estimation. In Proceedings of the IEEE International Conference on Big Data (Big Data’17). IEEE, 1087--1094.
[44]
Dawen Liang, Matthew D. Hoffman, and Daniel P. W. Ellis. 2013. Beta process sparse nonnegative matrix factorization for music. In Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR’13). 375--380.
[45]
Adi Lin, Junyu Xuan, Guangquan Zhang, and Jie Lu. 2018. Causal inference with Gaussian processes for support of terminating or maintaining an existing program. In Proceedings of the Conference on Data Science and Knowledge Engineering for Sensing Decision Support, Vol. 11. World Scientific, 397.
[46]
Christos Louizos, Uri Shalit, Joris M. Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. Causal effect inference with deep latent-variable models. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 6446--6456.
[47]
Wei Luo and Yeying Zhu. 2019. Matching using sufficient dimension reduction for causal inference. J. Bus. Econ. Stat. (2019), 1--13.
[48]
Wei Luo, Yeying Zhu, and Debashis Ghosh. 2017. On estimating regression-based causal effects using sufficient dimension reduction. Biometrika 104, 1 (2017), 51--65.
[49]
Radford M. Neal. 2000. Markov chain sampling methods for Dirichlet process mixture models. J. Computat. Graph. Stat. 9, 2 (2000), 249--265.
[50]
Andrew Y. Ng and Michael I. Jordan. 2002. On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 841--848.
[51]
Carl Edward Rasmussen. 1999. The infinite Gaussian mixture model. In Proceedings of the 11th International Conference on Neural Information Processing Systems (NIPS’99). 554--560.
[52]
James Robins. 1986. A new approach to causal inference in mortality studies with a sustained exposure period—Application to control of the healthy worker survivor effect. Math. Model. 7, 9--12 (1986), 1393--1512.
[53]
James M. Robins, Miguel Angel Hernan, and Babette Brumback. 2000. Marginal structural models and causal inference in epidemiology.
[54]
James M. Robins, Andrea Rotnitzky, and Lue Ping Zhao. 1994. Estimation of regression coefficients when some regressors are not always observed. J. Amer. Stat. Assoc. 89, 427 (1994), 846--866.
[55]
Paul R. Rosenbaum and Donald B. Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 1 (1983), 41--55.
[56]
Paul R. Rosenbaum and Donald B. Rubin. 1984. Reducing bias in observational studies using subclassification on the propensity score. J. Amer. Stat. Assoc. 79, 387 (1984), 516--524.
[57]
Jason Roy, Kirsten J. Lum, Bret Zeldow, Jordan D. Dworkin, Vincent Lo Re III, and Michael J. Daniels. 2018. Bayesian nonparametric generative models for causal inference with missing at random covariates. Biometrics 74, 4 (2018).
[58]
Donald B. Rubin. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies.J. Educ. Psychol. 66, 5 (1974), 688.
[59]
Matthias Seeger. 2004. Gaussian processes for machine learning. Int. J. Neural Syst. 14, 2 (2004), 69--106.
[60]
Jasjeet S. Sekhon et al. 2011. Multivariate and propensity score matching software with automated balance optimization: The matching package for R. J. Stat. Softw. 42, i07 (2011).
[61]
Babak Shahbaba and Radford Neal. 2009. Nonlinear models using Dirichlet process mixtures. J. Mach. Learn. Res. 10, Aug. (2009), 1829--1850.
[62]
Uri Shalit, Fredrik D. Johansson, and David Sontag. 2017. Estimating individual treatment effect: Generalization bounds and algorithms. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 3076--3085.
[63]
David R. Shanks, Keith Ed Holyoak, and Douglas L. Medin. 1996. Causal Learning. Academic Press.
[64]
Herbert L. Smith. 1997. Matching with multiple controls to estimate treatment effects in observational studies. Sociolog. Methodol. 27, 1 (1997), 325--353.
[65]
Jerzy Splawa-Neyman, Dorota M. Dabrowska, and T. P. Speed. 1990. On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Stat. Sci. 5, 4 (1990), 465--472.
[66]
Elizabeth A. Stuart and Kerry M. Green. 2008. Using full matching to estimate causal effects in nonexperimental studies: Examining the relationship between adolescent marijuana use and adult outcomes. Devel. Psychol. 44, 2 (2008), 395.
[67]
Yee Whye Teh. 2011. Dirichlet process. In Encyclopedia of Machine Learning. Springer, 280--287.
[68]
Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. 2006. Hierarchical Dirichlet processes. J. Amer. Stat. Assoc. 101, 476 (2006), 1566--1581.
[69]
Mark J. Van Der Laan and Daniel Rubin. 2006. Targeted maximum likelihood learning. Int. J. Biostat. 2, 1 (2006).
[70]
Tyler J. VanderWeele. 2009. Concerning the consistency assumption in causal inference. Epidemiology 20, 6 (2009), 880--883.
[71]
Sara Wade, David B. Dunson, Sonia Petrone, and Lorenzo Trippa. 2014. Improving prediction from Dirichlet process mixtures via enrichment. J. Mach. Learn. Res. 15, 1 (2014), 1041--1071.
[72]
Sara Wade, Silvia Mongelluzzo, Sonia Petrone, et al. 2011. An enriched conjugate prior for Bayesian nonparametric inference. Bayes. Anal. 6, 3 (2011), 359--385.
[73]
Xingyu Wu, Bingbing Jiang, Kui Yu, Chunyan Miao, and Huanhuan Chen. 2019. Accurate Markov boundary discovery for causal feature selection. IEEE Trans. Cyber. (2019).
[74]
Junyu Xuan, Jie Lu, and Guangquan Zhang. 2019. Cooperative hierarchical Dirichlet processes: Superposition vs. maximization. Artif. Intell. (2019).
[75]
Junyu Xuan, Jie Lu, and Guangquan Zhang. 2019. A survey on Bayesian nonparametric learning. ACM Comput. Surv. 52, 1 (2019), 13.
[76]
Junyu Xuan, Jie Lu, Guangquan Zhang, Richard Yi Xu, and Xiangfeng Luo. 2017. A Bayesian nonparametric model for multi-label learning. Mach. Learn. 106, 11 (Nov. 2017), 1787--1815.
[77]
Junyu Xuan, Jie Lu, Guangquan Zhang, Richard Yida Xu, and Xiangfeng Luo. 2017. Doubly nonparametric sparse nonnegative matrix factorization based on dependent Indian buffet processes. IEEE Trans. Neural Netw. Learn. Syst. PP, 99 (2017), 1--15.
[78]
Kun Zhang, Mingming Gong, Joseph Ramsey, Kayhan Batmanghelich, Peter Spirtes, and Clark Glymour. 2018. Causal discovery with linear non-Gaussian models under measurement error: Structural identifiability results. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence (UAI’18).
[79]
Fujin Zhu, Adi Lin, Guangquan Zhang, and Jie Lu. 2018. Counterfactual inference with hidden confounders using implicit generative models. In Proceedings of the Australasian Joint Conference on Artificial Intelligence. Springer, 519--530.
[80]
Yeying Zhu, Maya Schonbach, Donna L. Coffman, and Jennifer S. Williams. 2015. Variable selection for propensity score estimation via balancing covariates. Epidemiology 26, 2 (2015), e14--e15.

Cited By

View all
  • (2024)Channel-Agnostic Radio Frequency Fingerprint Identification Using Spectral Quotient Constellation ErrorsIEEE Transactions on Wireless Communications10.1109/TWC.2023.327651923:1(158-170)Online publication date: Jan-2024
  • (2024)CLEAR: Cluster-Enhanced Contrast for Self-Supervised Graph Representation LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.317777535:1(899-912)Online publication date: Jan-2024
  • (2023)Multisource Heterogeneous Domain Adaptation With Conditional Weighting Adversarial NetworkIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.310586834:4(2079-2092)Online publication date: Apr-2023
  • Show More Cited By

Index Terms

  1. A Causal Dirichlet Mixture Model for Causal Inference from Observational Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 11, Issue 3
    Survey Paper and Regular Papers
    June 2020
    286 pages
    ISSN:2157-6904
    EISSN:2157-6912
    DOI:10.1145/3392081
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 April 2020
    Accepted: 01 January 2020
    Revised: 01 November 2019
    Received: 01 August 2019
    Published in TIST Volume 11, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Bayesian nonparametric
    2. Causal inference
    3. Dirichlet process

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Australian Research Council under Discovery

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)54
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Channel-Agnostic Radio Frequency Fingerprint Identification Using Spectral Quotient Constellation ErrorsIEEE Transactions on Wireless Communications10.1109/TWC.2023.327651923:1(158-170)Online publication date: Jan-2024
    • (2024)CLEAR: Cluster-Enhanced Contrast for Self-Supervised Graph Representation LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.317777535:1(899-912)Online publication date: Jan-2024
    • (2023)Multisource Heterogeneous Domain Adaptation With Conditional Weighting Adversarial NetworkIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.310586834:4(2079-2092)Online publication date: Apr-2023
    • (2023)On the Comparisons of Decorrelation Approaches for Non-Gaussian Neutral Vector VariablesIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.297885834:4(1823-1837)Online publication date: Apr-2023
    • (2023)No-Reference Light Field Image Quality Assessment Using Four-Dimensional Sparse TransformIEEE Transactions on Multimedia10.1109/TMM.2021.312739825(457-472)Online publication date: 2023
    • (2023)Perceptual Quality Assessment of Cartoon ImagesIEEE Transactions on Multimedia10.1109/TMM.2021.312187525(140-153)Online publication date: 2023
    • (2023)MAUTH: Continuous User Authentication Based on Subtle Intrinsic Muscular TremorsIEEE Transactions on Mobile Computing10.1109/TMC.2023.3243687(1-13)Online publication date: 2023
    • (2023)No Seeing is Also Believing: Electromagnetic-Emission-Based Application Guessing Attacks via SmartphonesIEEE Transactions on Mobile Computing10.1109/TMC.2021.309220922:2(1095-1109)Online publication date: 1-Feb-2023
    • (2023)Multi-Dimensional Classification via Decomposed Label EncodingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.310043635:2(1844-1856)Online publication date: 1-Feb-2023
    • (2023)Intelligent Traction Inverter in Next Generation Electric Vehicles: The Health Monitoring of Silicon-Carbide Power ModulesIEEE Transactions on Intelligent Vehicles10.1109/TIV.2023.32947268:12(4734-4753)Online publication date: Dec-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media