Abstract
Finding a causal relationship that can be generalized in different scenarios is a fundamental problem in science. However, in real-world scenarios, it commonly encounters a distribution shift, of which the underlying generating process changes across the domains. Such a distribution shift brings the challenge to the causal discovery from observational data, as most of the current models assume a fixed causal mechanism in heterogeneous data. As a consequence, the causal direction fails to be identified. Fortunately, in a general causal system, the distributions in the causal direction (but not the anti-causal direction) change independently across the domains, which inspires a way for causal discovery in the multi-domain data by measuring the independent change. By investigating the modularity of the causal mechanism in the multi-domain discretization data, we establish theoretical results on the identification of the causal direction under a mild technical condition. One step further, by utilizing the discretization technique, we propose a general framework for causal direction identification in the multi-domain data without assuming the specific causal mechanism and data types. We verify the effectiveness of our proposed methods in synthetic data and successfully identified the causal direction in two real-world datasets.






Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Andersson SA, Madigan D, Perlman MD et al (1997) A characterization of markov equivalence classes for acyclic digraphs. Ann Stat 25(2):505–541. https://doi.org/10.1214/aos/1031833662
Asuncion A, Newman D (2007) Uci machine learning repository
Cai R, Qiao J, Zhang K, Zhang Z, Hao Z (2018) Causal discovery from discrete data using hidden compact representation. In: Advances in neural information processing systems 31: Annual conference on neural information processing systems 2018, NeurIPS 2018. Montréal, Canada, pp 2671–2679
Cai R, Qiao J, Zhang Z, Hao Z (2018) Self: Structural equational embedded likelihood framework for causal discovery. In: AAAI
Cai R, Ye J, Qiao J, Fu H, Hao Z (2020) Fom: fourth-order moment based causal direction identification on the heteroscedastic data. Neural Netw 124:193–201. https://doi.org/10.1016/j.neunet.2020.01.006
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, 2016, pp. 785–794. https://doi.org/10.1145/2939672.2939785
Chickering DM (2002) Optimal structure identification with greedy search. J Mach Learn Res 3(Nov):507–554
Ghassami A, Kiyavash N, Huang B, Zhang K (2018) Multi-domain causal structure learning in linear systems. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018. Montréal, Canada, pp 6269–6279
Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with hilbert-schmidt norms. Int Conf Algorithm Learn Theory. https://doi.org/10.1007/115640897
Hausser J, Strimmer K (2014) Entropy: estimation of entropy, mutual information and related quantities. R package version 1(2):1
Hoyer PO, Janzing D, Mooij JM, Peters J, Schölkopf B (2008) Nonlinear causal discovery with additive noise models. In: Advances in Neural Information Processing Systems 21, Proceedings of the twenty-second annual conference on neural information processing systems, Vancouver, British Columbia, Canada, pp. 689–696
Huang B, Zhang K, Gong M, Glymour C (2019) Causal discovery and forecasting in nonstationary environments with state-space models. In: International conference on machine learning, pp. 2901–2910. PMLR
Huang B, Zhang K, Schölkopf B (2015) Identification of time-dependent causal model: a gaussian process treatment. In: Twenty-Fourth international joint conference on artificial intelligence
Huang B, Zhang K, Zhang J, Ramsey J, Sanchez-Romero R, Glymour C, Schölkopf B (2020) Causal discovery from heterogeneous/nonstationary data. J Mach Learn Res 21(89):1–53
Janzing D, Mooij J, Zhang K, Lemeire J, Zscheischler J, Daniušis P, Steudel B, Schölkopf B (2012) Information-geometric approach to inferring causal directions. Artif Intell 182:1–31. https://doi.org/10.1016/j.artint.2012.01.002
Khemakhem I, Monti R, Leech R, Hyvarinen A (2021) Causal autoregressive flows. In: International conference on artificial intelligence and statistics, pp. 3520–3528. PMLR
Liu F, Chan L (2016) Causal inference on discrete data via estimating distance correlations. Neural Comput
Mooij JM, Peters J, Janzing D, Zscheischler J, Schölkopf B (2016) Distinguishing cause from effect using observational data: methods and benchmarks. J Mach Learn Res 17(1):1103–1204
Pearl J, Verma TS (1995) A theory of inferred causation. Stud Logic Found Math 134:789–811. https://doi.org/10.1016/S0049-237X(06)80074-1
Peters J, Janzing D, Schölkopf B (2010) Identifying cause and effect on discrete data using additive noise models. In: AISTATS, pp. 597–604
Shimizu S, Hoyer PO, Hyvärinen A, Kerminen A (2006) A linear non-gaussian acyclic model for causal discovery. J Mach Learn Res 7:2003–2030
Spirtes P, Glymour CN, Scheines R (2020) Causation, prediction, and search. https://doi.org/10.1007/978-1-4612-2748-9
Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing bayesian network structure learning algorithm. Mach Learn 65(1):31–78. https://doi.org/10.1007/s10994-006-6889-7
Zhang K, Huang B, Zhang J, Glymour C, Schölkopf B (2017) Causal discovery from nonstationary/heterogeneous data: Skeleton estimation and orientation determination. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI 2017, Melbourne, Australia, 2017, pp. 1347–1353. https://doi.org/10.24963/ijcai.2017/187
Zhang K, Hyvärinen A (2009) On the identifiability of the post-nonlinear causal model. In: 25th Conference on uncertainty in artificial intelligence (UAI 2009), pp. 647–655. AUAI Press
Acknowledgements
This research was supported in part by Natural Science Foundation of China (61876043, 61976052), Science and Technology Planning Project of Guangzhou (201902010058), Guangdong Provincial Science and Technology Innovation Strategy Fund (2019B121203012).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Qiao, J., Bai, Y., Cai, R. et al. Causal discovery from multi-domain data using the independence of modularities. Neural Comput & Applic 34, 1939–1949 (2022). https://doi.org/10.1007/s00521-021-06507-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06507-4