skip to main content
survey

Causality-based Feature Selection: Methods and Evaluations

Published: 28 September 2020 Publication History

Abstract

Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt to capture causal relationships between them. It has been shown that the knowledge about the causal relationships between features and the class variable has potential benefits for building interpretable and robust prediction models, since causal relationships imply the underlying mechanism of a system. Consequently, causality-based feature selection has gradually attracted greater attentions and many algorithms have been proposed. In this article, we present a comprehensive review of recent advances in causality-based feature selection. To facilitate the development of new algorithms in the research area and make it easy for the comparisons between new methods and existing ones, we develop the first open-source package, called CausalFS, which consists of most of the representative causality-based feature selection algorithms (available at https://github.com/kuiy/CausalFS). Using CausalFS, we conduct extensive experiments to compare the representative algorithms with both synthetic and real-world datasets. Finally, we discuss some challenging problems to be tackled in future research.

Supplementary Material

a111-yu-suppl.pdf (yu.zip)
Supplemental movie, appendix, image and software files for, Causality-based Feature Selection: Methods and Evaluations

References

[1]
Silvia Acid, Luis M. de Campos, and Javier G. Castellano. 2005. Learning Bayesian network classifiers: Searching in a space of partially directed acyclic graphs. Mach. Learn. 59, 3 (2005), 213--235.
[2]
Silvia Acid, Luis M. de Campos, and Moisés Fernández. 2013. Score-based methods for learning Markov boundaries by searching in constrained spaces. Data Mining Knowl. Disc. 26, 1 (2013), 174--212.
[3]
Alan Agresti and Maria Kateri. 2011. Categorical Data Analysis. Springer.
[4]
Hirotugu Akaike. 1974. A new look at the statistical model identification. In Selected Papers of Hirotugu Akaike. Springer, 215--222.
[5]
Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos. 2010. Local causal and Markov blanket induction for causal discovery and feature selection for classification part I: Algorithms and empirical evaluation. J. Mach. Learn. Res. 11 (2010), 171--234.
[6]
Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos. 2010. Local causal and markov blanket induction for causal discovery and feature selection for classification part ii: Analysis and extensions. J. Mach. Learn. Res. 11, Jan. (2010), 235--284.
[7]
Constantin F. Aliferis, Ioannis Tsamardinos, and Alexander Statnikov. 2003. HITON: A novel Markov blanket algorithm for optimal variable selection. In AMIA Annual Symposium Proceedings, Vol. 2003. American Medical Informatics Association, 21.
[8]
Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant risk minimization. Arxiv Preprint Arxiv:1907.02893 (2019).
[9]
Susan Athey. 2017. Beyond prediction: Using big data for policy problems. Science 355, 6324 (2017), 483--485.
[10]
Harold Bae, Stefano Monti, Monty Montano, Martin H. Steinberg, Thomas T. Perls, and Paola Sebastiani. 2016. Learning Bayesian networks from correlated data. Sci. Rep. 6, 1 (2016), 1--14.
[11]
Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, and Christopher Pal. 2019. A meta-transfer objective for learning to disentangle causal mechanisms. Arxiv Preprint:1901.10912 (2019).
[12]
Giorgos Borboudakis and Ioannis Tsamardinos. 2019. Forward-backward selection with early dropping. J. Mach. Learn. Res. 20, 1 (2019), 276--314.
[13]
Gavin Brown, Adam Pocock, Ming-Jie Zhao, and Mikel Luján. 2012. Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, Jan. (2012), 27--66.
[14]
Wray Buntine. 1991. Theory refinement on Bayesian networks. In Proceedings of the Uncertainty in Artificial Intelligence Conference (UAI’91). Morgan Kaufmann Publishers Inc., 52--60.
[15]
Ruichu Cai, Zhenjie Zhang, and Zhifeng Hao. 2011. BASSUM: A Bayesian semi-supervised method for classification feature selection. Pattern Recog. 44, 4 (2011), 811--820.
[16]
Luis M. de Campos. 2006. A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J. Mach. Learn. Res. 7, Oct. (2006), 2149--2187.
[17]
Debo Cheng, Jiuyong Li, Lin Liu, Jixue Liu, Kui Yu, and Thuc Duy Le. 2020. Causal query in observational data with hidden variables. Arxiv Preprint:2001.10269 (2020).
[18]
David Maxwell Chickering. 2002. Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res. 2, 3 (2002), 445--498.
[19]
David Maxwell Chickering. 2002. Optimal structure identification with greedy search. J. Mach. Learn. Res. 3, Nov. (2002), 507--554.
[20]
Diego Colombo, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson. 2012. Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Statist. 40, 1 (2012), 294--321.
[21]
Gregory F. Cooper and Edward Herskovits. 1992. A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 4 (1992), 309--347.
[22]
Povilas Daniusis, Dominik Janzing, Joris Mooij, Jakob Zscheischler, Bastian Steudel, Kun Zhang, and Bernhard Schölkopf. 2012. Inferring deterministic causal relations. Arxiv Preprint Arxiv:1203.3475 (2012).
[23]
Sergio Rodrigues De Morais and Alex Aussem. 2008. A novel scalable and data efficient feature subset selection algorithm. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’08). Springer, 298--312.
[24]
Byron Ellis and Wing Hung Wong. 2008. Learning causal Bayesian network structures from experimental data. J. Amer. Statist. Assoc. 103, 482 (2008), 778--789.
[25]
Robin J. Evans et al. 2018. Margins of discrete Bayesian networks. Ann. Statist. 46, 6A (2018), 2623--2656.
[26]
Shunkai Fu and Michel C. Desmarais. 2008. Fast Markov blanket discovery algorithm via local learning within single pass. In Proceedings of the Conference of the Canadian Society for Computational Studies of Intelligence. Springer, 96--107.
[27]
Tian Gao, Kshitij Fadnis, and Murray Campbell. 2017. Local-to-global Bayesian network structure learning. In Proceedings of the International Conference on Machine Learning (ICML’17). JMLR.org, 1193--1202.
[28]
Tian Gao and Qiang Ji. 2015. Local causal discovery of direct causes and effects. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’15). 2512--2520.
[29]
Tian Gao and Qiang Ji. 2016. Constrained local latent variable discovery. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16). 1490--1496.
[30]
Tian Gao and Qiang Ji. 2017. Efficient Markov blanket discovery and its application. IEEE Trans. Cyber. 47, 5 (2017), 1169--1179.
[31]
Tian Gao and Qiang Ji. 2017. Efficient score-based Markov blanket discovery. Int. J. Approx. Reas. 80 (2017), 277--293.
[32]
Tian Gao and Dennis Wei. 2018. Parallel Bayesian network structure learning. In Proceedings of the International Conference on Machine Learning (ICML’18). 1671--1680.
[33]
Clark Glymour, Kun Zhang, and Peter Spirtes. 2019. Review of causal discovery methods based on graphical models. Front. Genet. 10 (2019).
[34]
Olivier Goudet, Diviyan Kalainathan, Philippe Caillou, Isabelle Guyon, David Lopez-Paz, and Michèle Sebag. 2017. Causal generative neural networks. Arxiv Preprint:1711.08936 (2017).
[35]
Ruocheng Guo, Lu Cheng, Jundong Li, P. Richard Hahn, and Huan Liu. 2020. A survey of learning causality with data: Problems and methods. ACM Computing Surveys (CSUR) 53, 4 (2020), 1--37.
[36]
Isabelle Guyon, Constantin Aliferis, et al. 2007. Causal feature selection. In Computational Methods of Feature Selection. Chapman and Hall/CRC, 75--97.
[37]
Isabelle Guyon and Andre Elisseeff. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3 (2003), 1157--1182.
[38]
David Heckerman, Dan Geiger, and David M. Chickering. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Mach. Learn. 20, 3 (1995), 197--243.
[39]
Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. 2009. Nonlinear causal discovery with additive noise models. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 689--696.
[40]
Antti Hyttinen, Frederick Eberhardt, and Matti Järvisalo. 2015. Do-calculus when the true graph is unknown. In Proceedings of the Uncertainty in Artificial Intelligence Conference (UAI’15). Citeseer, 395--404.
[41]
Dominik Janzing, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniušis, Bastian Steudel, and Bernhard Schölkopf. 2012. Information-geometric approach to inferring causal directions. Artif. Intell. 182 (2012), 1--31.
[42]
Markus Kalisch, Martin Mächler, Diego Colombo, Marloes H. Maathuis, Peter Bühlmann, et al. 2012. Causal inference using graphical models with the R package pcalg. J. Statist. Softw. 47, 11 (2012), 1--26.
[43]
Ron Kohavi and George H. John. 1997. Wrappers for feature subset selection. Artif. Intell. 97, 1–2 (1997), 273--324.
[44]
Mikko Koivisto and Kismat Sood. 2004. Exact Bayesian structure discovery in Bayesian networks. J. Mach. Learn. Res. 5, May (2004), 549--573.
[45]
Daphne Koller and Mehran Sahami. 1996. Toward optimal feature selection. In Proceedings of the International Conference on Machine Learning (ICML’96). Morgan Kaufmann Publishers Inc., 284--292.
[46]
Wai Lam and Fahiem Bacchus. 1994. Learning Bayesian belief networks: An approach based on the MDL principle. Comput. Intell. 10, 3 (1994), 269--293.
[47]
Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. Comput. Surv. 50, 6 (2017), 94.
[48]
Jiuyong Li, Lin Liu, and Thuc Duy Le. 2015. Practical Approaches to Causal Relationship Exploration. Springer.
[49]
Zhaolong Ling, Kui Yu, Hao Wang, Lei Li, and Xindong Wu. 2020. Using feature selection for local causal structure learning. IEEE Trans. Emerg. Topics Comput. Intell. (2020).
[50]
Zhaolong Ling, Kui Yu, Hao Wang, Lin Liu, Wei Ding, and Xindong Wu. 2019. BAMB: A balanced Markov blanket discovery approach to feature selection. ACM Trans. Intell. Syst. Technol. 10, 5 (2019), 1--25.
[51]
Xuqing Liu and Xinsheng Liu. 2016. Swamping and masking in Markov boundary discovery. Mach. Learn. 104, 1 (2016), 25--54.
[52]
Xu-Qing Liu and Xin-Sheng Liu. 2018. Markov blanket and Markov boundary of multiple variables. J. Mach. Learn. Res. 19, 1 (2018), 1658--1707.
[53]
Marloes H. Maathuis, Markus Kalisch, Peter Bühlmann, et al. 2009. Estimating high-dimensional intervention effects from observational data. Ann. Stat. 37, 6A (2009), 3133--3164.
[54]
Sara Magliacane, Thijs van Ommen, Tom Claassen, Stephan Bongers, Philip Versteeg, and Joris M. Mooij. 2018. Domain adaptation by using causal inference to predict invariant conditional distributions. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 10846--10856.
[55]
Dimitris Margaritis. 2009. Toward provably correct feature selection in arbitrary domains. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’09). 1240--1248.
[56]
Dimitris Margaritis and Sebastian Thrun. 2000. Bayesian network induction via local neighborhoods. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’00). 505--511.
[57]
Andrés R. Masegosa and Serafín Moral. 2012. A Bayesian stochastic search method for discovering Markov boundaries. Knowl.-based Syst. 35 (2012), 211--223.
[58]
John H. McDonald. 2009. Handbook of Biological Statistics. Vol. 2. Sparky House Publishing, Baltimore, MD.
[59]
Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. 2016. Distinguishing cause from effect using observational data: Methods and benchmarks. J. Mach. Learn. Res. 17, 1 (2016), 1103--1204.
[60]
Kevin Murphy et al. 2001. The Bayes net toolbox for Matlab. Comput. Sci. Statist. 33, 2 (2001), 1024--1034.
[61]
T. Niinimki and Pekka Parviainen. 2012. Local structure discovery in Bayesian networks. In Proceedings of the Workshop on Causal Structure Learning of UAI’12. 634--643.
[62]
Judea Pearl. 1995. Causal diagrams for empirical research. Biometrika 82, 4 (1995), 669--688.
[63]
Judea Pearl. 2009. Causality. Cambridge University Press, Cambridge, UK.
[64]
Judea Pearl. 2014. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.
[65]
Judea Pearl et al. 2009. Causal inference in statistics: An overview. Statist. Surv. 3 (2009), 96--146.
[66]
Judea Pearl and Dana Mackenzie. 2018. The Book of Why: the New Science of Cause and Effect. Basic Books.
[67]
Jean-Philippe Pellet and André Elisseeff. 2008. Using Markov blankets for causal structure learning. J. Mach. Learn. Res. 9, July (2008), 1295--1342.
[68]
Jose M. Peña. 2008. Learning Gaussian graphical models of gene networks with false discovery rate control. In Proceedings of the European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Springer, 165--176.
[69]
Jose M. Peña, Johan Björkegren, and Jesper Tegnér. 2005. Scalable, efficient and correct learning of Markov boundaries under the faithfulness assumption. In Proceedings of the European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty. Springer, 136--147.
[70]
Jose M. Pena, Roland Nilsson, Johan Björkegren, and Jesper Tegnér. 2007. Towards scalable and data efficient learning of Markov boundaries. Int. J. Approx. Reas. 45, 2 (2007), 211--232.
[71]
Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. 2016. Causal inference by using invariant prediction: Identification and confidence intervals. J. Roy. Statist. Soc.: Series B (Statist. Methodol.) 78, 5 (2016), 947--1012.
[72]
Jonas Peters, Dominik Janzing, and Bernhard Scholkopf. 2011. Causal inference on discrete data using additive noise models. IEEE Trans. Pattern Anal. Mach. Intell. 33, 12 (2011), 2436--2450.
[73]
Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2017. Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, Cambridge, UK.
[74]
Jonas Peters, Joris M. Mooij, Dominik Janzing, and Bernhard Schölkopf. 2011. Identifiability of causal graphs using functional models. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. 589--598.
[75]
Adam Pocock, Mikel Luján, and Gavin Brown. 2012. Informative priors for Markov blanket discovery. In Proceedings of the International Workshop on Artificial Intelligence and Statistics (AI and Statistics’12). 905--913.
[76]
Joseph Ramsey, Madelyn Glymour, Ruben Sanchez-Romero, and Clark Glymour. 2017. A million variables and more: The fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. Int. J. Data Sci. Anal. 3, 2 (2017), 121--129.
[77]
Thomas Richardson, Peter Spirtes, et al. 2002. Ancestral graph Markov models. Ann. Stat. 30, 4 (2002), 962--1030.
[78]
Raanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz, Guy Koren, and Gal Novik. 2018. Constructing deep neural networks by Bayesian network structure learning. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 3047--3058.
[79]
M. Rojas-Carulla, B. Schölkopf, R. Turner, and J. Peters. 2018. Invariant models for causal transfer learning. J. Mach. Learn. Res. 19, 36 (2018), 1--34.
[80]
Yvan Saeys, Inaki Inza, and Pedro Larranaga. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 19 (2007), 2507--2517.
[81]
Richard Scheines, Peter Spirtes, Clark Glymour, Christopher Meek, and Thomas Richardson. 1998. The TETRAD project: Constraint based aids to causal model specification. Multivar. Behav. Res. 33, 1 (1998), 65--117.
[82]
Bernhard Schölkopf. 2019. Causality for machine learning. Arxiv Preprint:1911.10500 (2019).
[83]
Marco Scutari. 2009. Learning Bayesian networks with the bnlearn R package. Arxiv Preprint:0908.3817 (2009).
[84]
Konstantinos Sechidis and Gavin Brown. 2015. Markov blanket discovery in positive-unlabelled and semi-supervised data. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’15). Springer, 351--366.
[85]
Konstantinos Sechidis and Gavin Brown. 2018. Simple strategies for semi-supervised feature selection. Mach. Learn. 107, 2 (2018), 357--395.
[86]
Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. 2006. A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7, Oct. (2006), 2003--2030.
[87]
Peter Spirtes, Clark N. Glymour, Richard Scheines, David Heckerman, Christopher Meek, Gregory Cooper, and Thomas Richardson. 2000. Causation, Prediction, and Search. The MIT Press, Cambridge, MA.
[88]
Alexander Statnikov, Nikita I. Lytkin, Jan Lemeire, and Constantin F. Aliferis. 2013. Algorithms for discovery of multiple Markov boundaries. J. Mach. Learn. Res. 14, Feb. (2013), 499--566.
[89]
Alexander Statnikov, Sisi Ma, Mikael Henaff, Nikita Lytkin, Efstratios Efstathiadis, Eric R. Peskin, and Constantin F. Aliferis. 2015. Ultra-scalable and efficient methods for hybrid observational and experimental local causal pathway discovery. J. Mach. Learn. Res. 16, 1 (2015), 3219--3267.
[90]
Alexander Statnikov, Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. 2010. Causal explorer: A Matlab library of algorithms for causal discovery and variable selection for classification. Chall. Mach. Learn. 2 (2010), 267--278.
[91]
Raphael Suter, Djordje Miladinovic, Bernhard Schölkopf, and Stefan Bauer. 2019. Robustly disentangled causal mechanisms: Validating deep representations for interventional robustness. In Proceedings of the International Conference on Machine Learning (ICML’19). 6056--6065.
[92]
Ioannis Tsamardinos and Constantin Aliferis. 2003. Towards principled feature selection: Relevancy, filters and wrappers. In Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics. Citeseer.
[93]
Ioannis Tsamardinos, Constantin F. Aliferis, and Alexander Statnikov. 2003. Time and sample efficient discovery of Markov blankets and direct causal relations. In Proceedings of the Conference on Knowledge Discovery and Data Mining (KDD’03). ACM, 673--678.
[94]
Ioannis Tsamardinos, Constantin F. Aliferis, Alexander R. Statnikov, and Er Statnikov. 2003. Algorithms for large scale Markov blanket discovery. In Proceedings of the Florida Artificial Intelligence Research Society Conference (FLAIRS’03), Vol. 2. 376--380.
[95]
Ioannis Tsamardinos, Giorgos Borboudakis, Pavlos Katsogridakis, Polyvios Pratikakis, and Vassilis Christophides. 2019. A greedy feature selection algorithm for big data of high dimensionality. Mach. Learn. 108, 2 (2019), 149--202.
[96]
Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65, 1 (2006), 31--78.
[97]
Changzhang Wang, You Zhou, Qiang Zhao, and Zhi Geng. 2014. Discovering and orienting the edges connected to a target variable in a DAG via a sequential local learning approach. Comput. Statist. Data Anal. 77 (2014), 252--266.
[98]
De Wang, Danesh Irani, and Calton Pu. 2012. Evolutionary study of web spam: Webb spam corpus 2011 versus webb spam corpus 2006. In 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom'12). IEEE, 40--49.
[99]
Hao Wang, Zhaolong Ling, Kui Yu, and Xindong Wu. 2020. Towards efficient and effective discovery of Markov blankets for feature selection. Inf. Sci. 509 (2020), 227--242.
[100]
Xingyu Wu, Bingbing Jiang, Kui Yu, Chunyan Miao, and Huanhuan Chen. 2019. Accurate Markov boundary discovery for causal feature selection. IEEE Trans. Cyber. (2019).
[101]
Xindong Wu, Kui Yu, Wei Ding, Hao Wang, and Xingquan Zhu. 2013. Online feature selection with streaming features. IEEE Trans. Pattern Anal. Mach. Intell. 35, 5 (2013), 1178--1192.
[102]
Sandeep Yaramakala and Dimitris Margaritis. 2005. Speculative Markov blanket discovery for optimal feature selection. In Proceedings of the IEEE International Conference on Data Mining (ICDM’05). IEEE, 4--9.
[103]
Jianxin Yin, You Zhou, Changzhang Wang, Ping He, Cheng Zheng, and Zhi Geng. 2008. Partial orientation and local structural learning of causal networks for prediction. In Proceedings of the Workshop on the Causation and Prediction Challenge. 93--105.
[104]
Kui Yu, Lin Liu, and Jiuyong Li. 2018. Discovering Markov blanket from multiple interventional datasets. Arxiv Preprint:1801.08295 (2018).
[105]
Kui Yu, Lin Liu, and Jiuyong Li. 2018. A unified view of causal and non-causal feature selection. Arxiv Preprint:1802.05844 (2018).
[106]
Kui Yu, Lin Liu, Jiuyong Li, and Huanhuan Chen. 2018. Mining Markov blankets without causal sufficiency. IEEE Trans. Neural Netw. Learn. Syst. 99 (2018), 1--15.
[107]
Kui Yu, Lin Liu, Jiuyong Li, Wei Ding, and Thuc Le. 2019. Multi-source causal feature selection. IEEE Trans. Pattern Anal. Mach. Intell. (2019).
[108]
Kui Yu, Xindong Wu, Wei Ding, Yang Mu, and Hao Wang. 2017. Markov blanket feature selection using representative sets. IEEE Trans. Neural Netw. Learn. Syst. 28, 11 (2017), 2775--2788.
[109]
Yue Yu, Jie Chen, Tian Gao, and Mo Yu. 2019. DAG-GNN: DAG structure learning with graph neural networks. In Proceedings of the International Conference on Machine Learning (ICML’19). 7154--7163.
[110]
Yiteng Zhai, Yewsoon Ong, and Ivor W. Tsang. 2014. The emerging big dimensionality. IEEE Comput. Intell. Mag. 9, 3 (2014), 14--26.
[111]
Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. Kernel-based conditional independence test and application in causal discovery. Arxiv Preprint:1202.3775 (2012).
[112]
Kun Zhang, Bernhard Schölkopf, Peter Spirtes, and Clark Glymour. 2017. Learning causality and causality-related learning: Some recent progress. Nat. Sci. Rev. 5, 1 (2017), 26--29.
[113]
Muhan Zhang, Shali Jiang, Zhicheng Cui, Roman Garnett, and Yixin Chen. 2019. D-VAE: A variational autoencoder for directed acyclic graphs. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’19). 1586--1598.

Cited By

View all
  • (2025)Label-Aware Causal Feature SelectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.352258037:3(1268-1281)Online publication date: 1-Mar-2025
  • (2025)CauseRuDi: Explaining Behavior Sequence Models by Causal Statistics Generation and Rule DistillationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348762537:1(116-129)Online publication date: 1-Jan-2025
  • (2025)Identifying cancer prognosis genes through causal learningBriefings in Bioinformatics10.1093/bib/bbae72126:1Online publication date: 14-Jan-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 53, Issue 5
September 2021
782 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3426973
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2020
Accepted: 01 June 2020
Revised: 01 June 2020
Received: 01 November 2019
Published in CSUR Volume 53, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bayesian network
  2. Feature selection
  3. Markov boundary

Qualifiers

  • Survey
  • Research
  • Refereed

Funding Sources

  • National Science Foundation of China
  • National Key Research and Development Program of China
  • Australian Research Council Discovery Projects

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)497
  • Downloads (Last 6 weeks)35
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Label-Aware Causal Feature SelectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.352258037:3(1268-1281)Online publication date: 1-Mar-2025
  • (2025)CauseRuDi: Explaining Behavior Sequence Models by Causal Statistics Generation and Rule DistillationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348762537:1(116-129)Online publication date: 1-Jan-2025
  • (2025)Identifying cancer prognosis genes through causal learningBriefings in Bioinformatics10.1093/bib/bbae72126:1Online publication date: 14-Jan-2025
  • (2025)Reliable prediction for TBM energy consumption during tunnel excavation: A novel technique balancing explainability and performanceUnderground Space10.1016/j.undsp.2024.09.00422(77-95)Online publication date: Jun-2025
  • (2025)Feature selection using binary horse herd optimization algorithm with lightGBA ensemble classification in microarray dataKnowledge-Based Systems10.1016/j.knosys.2025.113168312(113168)Online publication date: Mar-2025
  • (2025)Federated local causal structure learningScience China Information Sciences10.1007/s11432-023-4203-668:3Online publication date: 16-Jan-2025
  • (2025)Causal feature selection for health state identification of complex experimental systemsNonlinear Dynamics10.1007/s11071-025-10914-wOnline publication date: 21-Feb-2025
  • (2025)Knoop: practical enhancement of knockoff with over-parameterization for variable selectionMachine Language10.1007/s10994-024-06692-y114:1Online publication date: 17-Jan-2025
  • (2025)Missing value replacement in strings and applicationsData Mining and Knowledge Discovery10.1007/s10618-024-01074-339:2Online publication date: 22-Jan-2025
  • (2025)Exploiting Causal Knowledge During CATE Estimation Using Tree Based MetalearnersMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74640-6_19(261-276)Online publication date: 1-Jan-2025
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media