An efficient algorithm for large-scale causal discovery

Hong, Yinghan; Liu, Zhusong; Mai, Guizhen

doi:10.1007/s00500-016-2281-0

An efficient algorithm for large-scale causal discovery

Methodologies and Application
Published: 03 August 2016

Volume 21, pages 7381–7391, (2017)
Cite this article

Soft Computing Aims and scope Submit manuscript

Yinghan Hong¹,
Zhusong Liu² &
Guizhen Mai³

566 Accesses
4 Citations
Explore all metrics

Abstract

Causal discovery is a fundamental problem in scientific research. Although many researchers are committed to finding causal relationships from observational data, large-scale causal discovery remains a tremendous challenge. In this paper, a new approach for large-scale causal discovery is proposed, based on a split-and-merge strategy. The method first splits a given dataset into small subdatasets using a graph-partitioning method and then develops a effective algorithm to infer the causality of each subdataset. The entire causal structure with respect to the given dataset is achieved by combining all the causalities of each subdataset. The experimental results show that the proposed approach is effective and scalable for large-scale causal discovery problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hybrid Approach for Large Scale Causality Discovery

High-dimensional causal discovery based on heuristic causal partitioning

Article 14 July 2023

A Survey on Causal Discovery

Notes

References

Cai R, Zhang Z, Hao Z (2013) Sada: a general framework to support robust causation discovery. In: Proceedings of the 30th international conference on machine learning, pp 208–216
Chickering DM (2003) Optimal structure identification with greedy search. J Mach Learn Res 3:507–554
MATH MathSciNet Google Scholar
Daniusis P, Janzing D, Mooij J, Zscheischler J, Steudel B, Zhang K, Schölkopf B (2012) Inferring deterministic causal relations. arXiv preprint arXiv:1203.3475
Fortier N, Sheppard J, Strasser S (2014) Abductive inference in Bayesian networks using distributed overlapping swarm intelligence. Soft Comput 19(4):981–1001
Article Google Scholar
Geiger D, Heckerman D (1994) Learning gaussian networks. In: Proceedings of the tenth international conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, pp 235–243
Gullberg M, Noreus K, Brattsand G, Friedrich B, Shingler V (1990) Purification and characterization of a 19-kilodalton intracellular protein. An activation-regulated putative protein kinase c substrate of t lymphocytes. J Biol Chem 265(29):17499–17505
Google Scholar
Gu B, Sheng VS (2016) A robust regularization path algorithm for v-support vector classification. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2016.2527796
Gu B, Sun X, Sheng VS (2016) Structural minimax probability machine. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2016.2544779
Hadley SW, Pelizzari C, Chen GTY (1996) Registration of localization images by maximization of mutual information. In: Proceedings of annual meeting of the American association of physicists in medicine
Hao Z, Zhang H, Cai R, Wen W, Li Z (2015) Causal discovery on high dimensional data. Appl Intell 42(3):594–607
Article Google Scholar
Heckerman D, Meek C, Cooper G (1999) A bayesian approach to causal discovery. Comput Causation Discov 19:141–166
MathSciNet Google Scholar
Herskovits E (1991) Computer-based probabilistic-network construction. Ph.D thesis, Stanford University, USA
Hoyer PO, Janzing D, Mooij JM, Peters J, Schölkopf B (2009) Nonlinear causal discovery with additive noise models. In: Advances in neural information processing systems. MIT press, Massachusetts, pp 689–696
Janzing D, Mooij J, Zhang K, Lemeire J, Zscheischler J, Daniušis P, Steudel B, Schölkopf B (2012) Information-geometric approach to inferring causal directions. Artif Intell 182:1–31
Article MATH MathSciNet Google Scholar
Kelly L, Clark J, Gilliland G (2002) Comprehensive genotypic analysis of leukemia: clinical and therapeutic implications. Curr Opin Oncol 14(1):10–18
Kim K-J, Cho S-B (2015) Ensemble bayesian networks evolved with speciation for high-performance prediction in data mining. Soft Comput. doi:10.1007/s00500-015-1841-z
Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E 69(6):066138
Article MathSciNet Google Scholar
Kwak N, Choi C-H (2002) Input feature selection by mutual information based on parzen window. Pattern Anal Mach Intell IEEE Trans 24(12):1667–1671
Article Google Scholar
Liu Z, Yan H, Li Z (2015a) Server-aided anonymous attribute-based authentication in cloud computing. Future Gener Comput Syst 24:61–66
Article Google Scholar
Liu Z, Yan H, Lin Z, Xu L (2015b) An improved cloud data sharing scheme with hierarchical attribute structure. J Univers Comput Sci 21(3):454–472
Google Scholar
Ma S, Li J, Liu L, Le TD (2016) Mining combined causes in large data sets. Knowl Based Syst 92:104–111
Article Google Scholar
Meek C (1997) Graphical models: selecting causal and statistical models. Ph.D thesis, Carnegie Mellon University
Pearl J (2009) Causality. Cambridge University Press, Cambridge
Book MATH Google Scholar
Peters J, Janzing D, Schölkopf B (2010) Identifying cause and effect on discrete data using additive noise models. In: International conference on artificial intelligence and statistics, pp 597–604
Peters J, Janzing D, Schölkopf B (2011) Causal inference on discrete data using additive noise models. Pattern Anal Mach Intell IEEE Trans 33(12):2436–2450
Article Google Scholar
Peters J, Mooij JM, Janzing D, Schölkopf B (2014) Causal discovery with continuous additive noise models. J Mach Learn Res 15(1):2009–2053
MATH MathSciNet Google Scholar
Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press, Cambridge
MATH Google Scholar
Schwarz G et al (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article MATH MathSciNet Google Scholar
Shimizu S, Hoyer PO, Hyvärinen A, Kerminen A (2006) A linear non-gaussian acyclic model for causal discovery. J Mach Learn Res 7:2003–2030
MATH MathSciNet Google Scholar
Spirtes P, Glymour CN, Richard S (2000) Causation, prediction and search, vol 81. MIT press, Cambridge
MATH Google Scholar
Tang L-J, Jiang J-H, Wu H-L, Shen G-L, Yu R-Q (2009) Variable selection using probability density function similarity for support vector machine classification of high-dimensional microarray data. Talanta 79(2):260–267
Article Google Scholar
Wang X, Gotoh O (2009) Accurate molecular classification of cancer using simple rules. BMC Med Genom 2(1):64
Article Google Scholar
Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295:395–406
Article Google Scholar
Zhang K, Hyvärinen A (2008) Distinguishing causes from effects using nonlinear acyclic causal models. In: Journal of machine learning research, workshop and conference proceedings (NIPS 2008 causality workshop), vol 6, pp 157–164
Zhang K, Peters J, Janzing D, Schölkopf B (2012) Kernel-based conditional independence test and application in causal discovery. arXiv preprint arXiv:1202.3775

Download references

Acknowledgments

This paper has been supported by Science and Technology Planning Project of Guangdong Province, China (2015A030401101), (2015B090922014), and by National Natural Science Foundation of China(61572144).

Author information

Authors and Affiliations

Physics and Electronic Engineering Department, Hanshan Normal University, Chaozhou, China
Yinghan Hong
School of computer Science and Technology, Guangdong University of Technology, Guangzhou, China
Zhusong Liu
School of Automation, Guangdong University of Technology, Guangzhou, China
Guizhen Mai

Authors

Yinghan Hong
View author publications
You can also search for this author in PubMed Google Scholar
Zhusong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guizhen Mai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yinghan Hong.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hong, Y., Liu, Z. & Mai, G. An efficient algorithm for large-scale causal discovery. Soft Comput 21, 7381–7391 (2017). https://doi.org/10.1007/s00500-016-2281-0

Download citation

Published: 03 August 2016
Issue Date: December 2017
DOI: https://doi.org/10.1007/s00500-016-2281-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient algorithm for large-scale causal discovery

Abstract

Access this article

Similar content being viewed by others

A Hybrid Approach for Large Scale Causality Discovery

High-dimensional causal discovery based on heuristic causal partitioning

A Survey on Causal Discovery

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient algorithm for large-scale causal discovery

Abstract

Access this article

Similar content being viewed by others

A Hybrid Approach for Large Scale Causality Discovery

High-dimensional causal discovery based on heuristic causal partitioning

A Survey on Causal Discovery

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation