survey

Causality-based Feature Selection: Methods and Evaluations

Authors:

Xindong WuAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 53, Issue 5

Article No.: 111, Pages 1 - 36

https://doi.org/10.1145/3409382

Published: 28 September 2020 Publication History

Abstract

Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt to capture causal relationships between them. It has been shown that the knowledge about the causal relationships between features and the class variable has potential benefits for building interpretable and robust prediction models, since causal relationships imply the underlying mechanism of a system. Consequently, causality-based feature selection has gradually attracted greater attentions and many algorithms have been proposed. In this article, we present a comprehensive review of recent advances in causality-based feature selection. To facilitate the development of new algorithms in the research area and make it easy for the comparisons between new methods and existing ones, we develop the first open-source package, called CausalFS, which consists of most of the representative causality-based feature selection algorithms (available at https://github.com/kuiy/CausalFS). Using CausalFS, we conduct extensive experiments to compare the representative algorithms with both synthetic and real-world datasets. Finally, we discuss some challenging problems to be tackled in future research.

Supplementary Material

a111-yu-suppl.pdf (yu.zip)

Supplemental movie, appendix, image and software files for, Causality-based Feature Selection: Methods and Evaluations

Download
122.90 KB

References

[1]

Silvia Acid, Luis M. de Campos, and Javier G. Castellano. 2005. Learning Bayesian network classifiers: Searching in a space of partially directed acyclic graphs. Mach. Learn. 59, 3 (2005), 213--235.

Digital Library

[2]

Silvia Acid, Luis M. de Campos, and Moisés Fernández. 2013. Score-based methods for learning Markov boundaries by searching in constrained spaces. Data Mining Knowl. Disc. 26, 1 (2013), 174--212.

Digital Library

[3]

Alan Agresti and Maria Kateri. 2011. Categorical Data Analysis. Springer.

[4]

Hirotugu Akaike. 1974. A new look at the statistical model identification. In Selected Papers of Hirotugu Akaike. Springer, 215--222.

[5]

Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos. 2010. Local causal and Markov blanket induction for causal discovery and feature selection for classification part I: Algorithms and empirical evaluation. J. Mach. Learn. Res. 11 (2010), 171--234.

[6]

Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos. 2010. Local causal and markov blanket induction for causal discovery and feature selection for classification part ii: Analysis and extensions. J. Mach. Learn. Res. 11, Jan. (2010), 235--284.

[7]

Constantin F. Aliferis, Ioannis Tsamardinos, and Alexander Statnikov. 2003. HITON: A novel Markov blanket algorithm for optimal variable selection. In AMIA Annual Symposium Proceedings, Vol. 2003. American Medical Informatics Association, 21.

[8]

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. 2019. Invariant risk minimization. Arxiv Preprint Arxiv:1907.02893 (2019).

[9]

Susan Athey. 2017. Beyond prediction: Using big data for policy problems. Science 355, 6324 (2017), 483--485.

[10]

Harold Bae, Stefano Monti, Monty Montano, Martin H. Steinberg, Thomas T. Perls, and Paola Sebastiani. 2016. Learning Bayesian networks from correlated data. Sci. Rep. 6, 1 (2016), 1--14.

[11]

Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, and Christopher Pal. 2019. A meta-transfer objective for learning to disentangle causal mechanisms. Arxiv Preprint:1901.10912 (2019).

[12]

Giorgos Borboudakis and Ioannis Tsamardinos. 2019. Forward-backward selection with early dropping. J. Mach. Learn. Res. 20, 1 (2019), 276--314.

Digital Library

[13]

Gavin Brown, Adam Pocock, Ming-Jie Zhao, and Mikel Luján. 2012. Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, Jan. (2012), 27--66.

[14]

Wray Buntine. 1991. Theory refinement on Bayesian networks. In Proceedings of the Uncertainty in Artificial Intelligence Conference (UAI’91). Morgan Kaufmann Publishers Inc., 52--60.

[15]

Ruichu Cai, Zhenjie Zhang, and Zhifeng Hao. 2011. BASSUM: A Bayesian semi-supervised method for classification feature selection. Pattern Recog. 44, 4 (2011), 811--820.

Digital Library

[16]

Luis M. de Campos. 2006. A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J. Mach. Learn. Res. 7, Oct. (2006), 2149--2187.

[17]

Debo Cheng, Jiuyong Li, Lin Liu, Jixue Liu, Kui Yu, and Thuc Duy Le. 2020. Causal query in observational data with hidden variables. Arxiv Preprint:2001.10269 (2020).

[18]

David Maxwell Chickering. 2002. Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res. 2, 3 (2002), 445--498.

Digital Library

[19]

David Maxwell Chickering. 2002. Optimal structure identification with greedy search. J. Mach. Learn. Res. 3, Nov. (2002), 507--554.

[20]

Diego Colombo, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson. 2012. Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Statist. 40, 1 (2012), 294--321.

[21]

Gregory F. Cooper and Edward Herskovits. 1992. A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 4 (1992), 309--347.

[22]

Povilas Daniusis, Dominik Janzing, Joris Mooij, Jakob Zscheischler, Bastian Steudel, Kun Zhang, and Bernhard Schölkopf. 2012. Inferring deterministic causal relations. Arxiv Preprint Arxiv:1203.3475 (2012).

Digital Library

[23]

Sergio Rodrigues De Morais and Alex Aussem. 2008. A novel scalable and data efficient feature subset selection algorithm. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’08). Springer, 298--312.

[24]

Byron Ellis and Wing Hung Wong. 2008. Learning causal Bayesian network structures from experimental data. J. Amer. Statist. Assoc. 103, 482 (2008), 778--789.

[25]

Robin J. Evans et al. 2018. Margins of discrete Bayesian networks. Ann. Statist. 46, 6A (2018), 2623--2656.

[26]

Shunkai Fu and Michel C. Desmarais. 2008. Fast Markov blanket discovery algorithm via local learning within single pass. In Proceedings of the Conference of the Canadian Society for Computational Studies of Intelligence. Springer, 96--107.

[27]

Tian Gao, Kshitij Fadnis, and Murray Campbell. 2017. Local-to-global Bayesian network structure learning. In Proceedings of the International Conference on Machine Learning (ICML’17). JMLR.org, 1193--1202.

[28]

Tian Gao and Qiang Ji. 2015. Local causal discovery of direct causes and effects. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’15). 2512--2520.

[29]

Tian Gao and Qiang Ji. 2016. Constrained local latent variable discovery. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16). 1490--1496.

[30]

Tian Gao and Qiang Ji. 2017. Efficient Markov blanket discovery and its application. IEEE Trans. Cyber. 47, 5 (2017), 1169--1179.

[31]

Tian Gao and Qiang Ji. 2017. Efficient score-based Markov blanket discovery. Int. J. Approx. Reas. 80 (2017), 277--293.

Digital Library

[32]

Tian Gao and Dennis Wei. 2018. Parallel Bayesian network structure learning. In Proceedings of the International Conference on Machine Learning (ICML’18). 1671--1680.

[33]

Clark Glymour, Kun Zhang, and Peter Spirtes. 2019. Review of causal discovery methods based on graphical models. Front. Genet. 10 (2019).

[34]

Olivier Goudet, Diviyan Kalainathan, Philippe Caillou, Isabelle Guyon, David Lopez-Paz, and Michèle Sebag. 2017. Causal generative neural networks. Arxiv Preprint:1711.08936 (2017).

[35]

Ruocheng Guo, Lu Cheng, Jundong Li, P. Richard Hahn, and Huan Liu. 2020. A survey of learning causality with data: Problems and methods. ACM Computing Surveys (CSUR) 53, 4 (2020), 1--37.

Digital Library

[36]

Isabelle Guyon, Constantin Aliferis, et al. 2007. Causal feature selection. In Computational Methods of Feature Selection. Chapman and Hall/CRC, 75--97.

[37]

Isabelle Guyon and Andre Elisseeff. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3 (2003), 1157--1182.

Digital Library

[38]

David Heckerman, Dan Geiger, and David M. Chickering. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Mach. Learn. 20, 3 (1995), 197--243.

[39]

Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. 2009. Nonlinear causal discovery with additive noise models. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 689--696.

[40]

Antti Hyttinen, Frederick Eberhardt, and Matti Järvisalo. 2015. Do-calculus when the true graph is unknown. In Proceedings of the Uncertainty in Artificial Intelligence Conference (UAI’15). Citeseer, 395--404.

[41]

Dominik Janzing, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniušis, Bastian Steudel, and Bernhard Schölkopf. 2012. Information-geometric approach to inferring causal directions. Artif. Intell. 182 (2012), 1--31.

Digital Library

[42]

Markus Kalisch, Martin Mächler, Diego Colombo, Marloes H. Maathuis, Peter Bühlmann, et al. 2012. Causal inference using graphical models with the R package pcalg. J. Statist. Softw. 47, 11 (2012), 1--26.

[43]

Ron Kohavi and George H. John. 1997. Wrappers for feature subset selection. Artif. Intell. 97, 1–2 (1997), 273--324.

Digital Library

[44]

Mikko Koivisto and Kismat Sood. 2004. Exact Bayesian structure discovery in Bayesian networks. J. Mach. Learn. Res. 5, May (2004), 549--573.

[45]

Daphne Koller and Mehran Sahami. 1996. Toward optimal feature selection. In Proceedings of the International Conference on Machine Learning (ICML’96). Morgan Kaufmann Publishers Inc., 284--292.

[46]

Wai Lam and Fahiem Bacchus. 1994. Learning Bayesian belief networks: An approach based on the MDL principle. Comput. Intell. 10, 3 (1994), 269--293.

[47]

Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. Comput. Surv. 50, 6 (2017), 94.

Digital Library

[48]

Jiuyong Li, Lin Liu, and Thuc Duy Le. 2015. Practical Approaches to Causal Relationship Exploration. Springer.

[49]

Zhaolong Ling, Kui Yu, Hao Wang, Lei Li, and Xindong Wu. 2020. Using feature selection for local causal structure learning. IEEE Trans. Emerg. Topics Comput. Intell. (2020).

[50]

Zhaolong Ling, Kui Yu, Hao Wang, Lin Liu, Wei Ding, and Xindong Wu. 2019. BAMB: A balanced Markov blanket discovery approach to feature selection. ACM Trans. Intell. Syst. Technol. 10, 5 (2019), 1--25.

Digital Library

[51]

Xuqing Liu and Xinsheng Liu. 2016. Swamping and masking in Markov boundary discovery. Mach. Learn. 104, 1 (2016), 25--54.

Digital Library

[52]

Xu-Qing Liu and Xin-Sheng Liu. 2018. Markov blanket and Markov boundary of multiple variables. J. Mach. Learn. Res. 19, 1 (2018), 1658--1707.

Digital Library

[53]

Marloes H. Maathuis, Markus Kalisch, Peter Bühlmann, et al. 2009. Estimating high-dimensional intervention effects from observational data. Ann. Stat. 37, 6A (2009), 3133--3164.

[54]

Sara Magliacane, Thijs van Ommen, Tom Claassen, Stephan Bongers, Philip Versteeg, and Joris M. Mooij. 2018. Domain adaptation by using causal inference to predict invariant conditional distributions. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 10846--10856.

[55]

Dimitris Margaritis. 2009. Toward provably correct feature selection in arbitrary domains. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’09). 1240--1248.

[56]

Dimitris Margaritis and Sebastian Thrun. 2000. Bayesian network induction via local neighborhoods. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’00). 505--511.

[57]

Andrés R. Masegosa and Serafín Moral. 2012. A Bayesian stochastic search method for discovering Markov boundaries. Knowl.-based Syst. 35 (2012), 211--223.

[58]

John H. McDonald. 2009. Handbook of Biological Statistics. Vol. 2. Sparky House Publishing, Baltimore, MD.

[59]

Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. 2016. Distinguishing cause from effect using observational data: Methods and benchmarks. J. Mach. Learn. Res. 17, 1 (2016), 1103--1204.

Digital Library

[60]

Kevin Murphy et al. 2001. The Bayes net toolbox for Matlab. Comput. Sci. Statist. 33, 2 (2001), 1024--1034.

[61]

T. Niinimki and Pekka Parviainen. 2012. Local structure discovery in Bayesian networks. In Proceedings of the Workshop on Causal Structure Learning of UAI’12. 634--643.

[62]

Judea Pearl. 1995. Causal diagrams for empirical research. Biometrika 82, 4 (1995), 669--688.

[63]

Judea Pearl. 2009. Causality. Cambridge University Press, Cambridge, UK.

[64]

Judea Pearl. 2014. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.

[65]

Judea Pearl et al. 2009. Causal inference in statistics: An overview. Statist. Surv. 3 (2009), 96--146.

[66]

Judea Pearl and Dana Mackenzie. 2018. The Book of Why: the New Science of Cause and Effect. Basic Books.

Digital Library

[67]

Jean-Philippe Pellet and André Elisseeff. 2008. Using Markov blankets for causal structure learning. J. Mach. Learn. Res. 9, July (2008), 1295--1342.

[68]

Jose M. Peña. 2008. Learning Gaussian graphical models of gene networks with false discovery rate control. In Proceedings of the European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Springer, 165--176.

Digital Library

[69]

Jose M. Peña, Johan Björkegren, and Jesper Tegnér. 2005. Scalable, efficient and correct learning of Markov boundaries under the faithfulness assumption. In Proceedings of the European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty. Springer, 136--147.

Digital Library

[70]

Jose M. Pena, Roland Nilsson, Johan Björkegren, and Jesper Tegnér. 2007. Towards scalable and data efficient learning of Markov boundaries. Int. J. Approx. Reas. 45, 2 (2007), 211--232.

Digital Library

[71]

Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. 2016. Causal inference by using invariant prediction: Identification and confidence intervals. J. Roy. Statist. Soc.: Series B (Statist. Methodol.) 78, 5 (2016), 947--1012.

[72]

Jonas Peters, Dominik Janzing, and Bernhard Scholkopf. 2011. Causal inference on discrete data using additive noise models. IEEE Trans. Pattern Anal. Mach. Intell. 33, 12 (2011), 2436--2450.

Digital Library

[73]

Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2017. Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, Cambridge, UK.

Digital Library

[74]

Jonas Peters, Joris M. Mooij, Dominik Janzing, and Bernhard Schölkopf. 2011. Identifiability of causal graphs using functional models. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. 589--598.

[75]

Adam Pocock, Mikel Luján, and Gavin Brown. 2012. Informative priors for Markov blanket discovery. In Proceedings of the International Workshop on Artificial Intelligence and Statistics (AI and Statistics’12). 905--913.

[76]

Joseph Ramsey, Madelyn Glymour, Ruben Sanchez-Romero, and Clark Glymour. 2017. A million variables and more: The fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. Int. J. Data Sci. Anal. 3, 2 (2017), 121--129.

[77]

Thomas Richardson, Peter Spirtes, et al. 2002. Ancestral graph Markov models. Ann. Stat. 30, 4 (2002), 962--1030.

[78]

Raanan Y. Rohekar, Shami Nisimov, Yaniv Gurwicz, Guy Koren, and Gal Novik. 2018. Constructing deep neural networks by Bayesian network structure learning. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’18). 3047--3058.

[79]

M. Rojas-Carulla, B. Schölkopf, R. Turner, and J. Peters. 2018. Invariant models for causal transfer learning. J. Mach. Learn. Res. 19, 36 (2018), 1--34.

[80]

Yvan Saeys, Inaki Inza, and Pedro Larranaga. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 19 (2007), 2507--2517.

Digital Library

[81]

Richard Scheines, Peter Spirtes, Clark Glymour, Christopher Meek, and Thomas Richardson. 1998. The TETRAD project: Constraint based aids to causal model specification. Multivar. Behav. Res. 33, 1 (1998), 65--117.

[82]

Bernhard Schölkopf. 2019. Causality for machine learning. Arxiv Preprint:1911.10500 (2019).

[83]

Marco Scutari. 2009. Learning Bayesian networks with the bnlearn R package. Arxiv Preprint:0908.3817 (2009).

[84]

Konstantinos Sechidis and Gavin Brown. 2015. Markov blanket discovery in positive-unlabelled and semi-supervised data. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’15). Springer, 351--366.

Digital Library

[85]

Konstantinos Sechidis and Gavin Brown. 2018. Simple strategies for semi-supervised feature selection. Mach. Learn. 107, 2 (2018), 357--395.

Digital Library

[86]

Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. 2006. A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7, Oct. (2006), 2003--2030.

[87]

Peter Spirtes, Clark N. Glymour, Richard Scheines, David Heckerman, Christopher Meek, Gregory Cooper, and Thomas Richardson. 2000. Causation, Prediction, and Search. The MIT Press, Cambridge, MA.

[88]

Alexander Statnikov, Nikita I. Lytkin, Jan Lemeire, and Constantin F. Aliferis. 2013. Algorithms for discovery of multiple Markov boundaries. J. Mach. Learn. Res. 14, Feb. (2013), 499--566.

[89]

Alexander Statnikov, Sisi Ma, Mikael Henaff, Nikita Lytkin, Efstratios Efstathiadis, Eric R. Peskin, and Constantin F. Aliferis. 2015. Ultra-scalable and efficient methods for hybrid observational and experimental local causal pathway discovery. J. Mach. Learn. Res. 16, 1 (2015), 3219--3267.

Digital Library

[90]

Alexander Statnikov, Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. 2010. Causal explorer: A Matlab library of algorithms for causal discovery and variable selection for classification. Chall. Mach. Learn. 2 (2010), 267--278.

[91]

Raphael Suter, Djordje Miladinovic, Bernhard Schölkopf, and Stefan Bauer. 2019. Robustly disentangled causal mechanisms: Validating deep representations for interventional robustness. In Proceedings of the International Conference on Machine Learning (ICML’19). 6056--6065.

[92]

Ioannis Tsamardinos and Constantin Aliferis. 2003. Towards principled feature selection: Relevancy, filters and wrappers. In Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics. Citeseer.

[93]

Ioannis Tsamardinos, Constantin F. Aliferis, and Alexander Statnikov. 2003. Time and sample efficient discovery of Markov blankets and direct causal relations. In Proceedings of the Conference on Knowledge Discovery and Data Mining (KDD’03). ACM, 673--678.

Digital Library

[94]

Ioannis Tsamardinos, Constantin F. Aliferis, Alexander R. Statnikov, and Er Statnikov. 2003. Algorithms for large scale Markov blanket discovery. In Proceedings of the Florida Artificial Intelligence Research Society Conference (FLAIRS’03), Vol. 2. 376--380.

[95]

Ioannis Tsamardinos, Giorgos Borboudakis, Pavlos Katsogridakis, Polyvios Pratikakis, and Vassilis Christophides. 2019. A greedy feature selection algorithm for big data of high dimensionality. Mach. Learn. 108, 2 (2019), 149--202.

Digital Library

[96]

Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65, 1 (2006), 31--78.

Digital Library

[97]

Changzhang Wang, You Zhou, Qiang Zhao, and Zhi Geng. 2014. Discovering and orienting the edges connected to a target variable in a DAG via a sequential local learning approach. Comput. Statist. Data Anal. 77 (2014), 252--266.

[98]

De Wang, Danesh Irani, and Calton Pu. 2012. Evolutionary study of web spam: Webb spam corpus 2011 versus webb spam corpus 2006. In 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom'12). IEEE, 40--49.

Digital Library

[99]

Hao Wang, Zhaolong Ling, Kui Yu, and Xindong Wu. 2020. Towards efficient and effective discovery of Markov blankets for feature selection. Inf. Sci. 509 (2020), 227--242.

Digital Library

[100]

Xingyu Wu, Bingbing Jiang, Kui Yu, Chunyan Miao, and Huanhuan Chen. 2019. Accurate Markov boundary discovery for causal feature selection. IEEE Trans. Cyber. (2019).

[101]

Xindong Wu, Kui Yu, Wei Ding, Hao Wang, and Xingquan Zhu. 2013. Online feature selection with streaming features. IEEE Trans. Pattern Anal. Mach. Intell. 35, 5 (2013), 1178--1192.

Digital Library

[102]

Sandeep Yaramakala and Dimitris Margaritis. 2005. Speculative Markov blanket discovery for optimal feature selection. In Proceedings of the IEEE International Conference on Data Mining (ICDM’05). IEEE, 4--9.

Digital Library

[103]

Jianxin Yin, You Zhou, Changzhang Wang, Ping He, Cheng Zheng, and Zhi Geng. 2008. Partial orientation and local structural learning of causal networks for prediction. In Proceedings of the Workshop on the Causation and Prediction Challenge. 93--105.

[104]

Kui Yu, Lin Liu, and Jiuyong Li. 2018. Discovering Markov blanket from multiple interventional datasets. Arxiv Preprint:1801.08295 (2018).

[105]

Kui Yu, Lin Liu, and Jiuyong Li. 2018. A unified view of causal and non-causal feature selection. Arxiv Preprint:1802.05844 (2018).

[106]

Kui Yu, Lin Liu, Jiuyong Li, and Huanhuan Chen. 2018. Mining Markov blankets without causal sufficiency. IEEE Trans. Neural Netw. Learn. Syst. 99 (2018), 1--15.

[107]

Kui Yu, Lin Liu, Jiuyong Li, Wei Ding, and Thuc Le. 2019. Multi-source causal feature selection. IEEE Trans. Pattern Anal. Mach. Intell. (2019).

Digital Library

[108]

Kui Yu, Xindong Wu, Wei Ding, Yang Mu, and Hao Wang. 2017. Markov blanket feature selection using representative sets. IEEE Trans. Neural Netw. Learn. Syst. 28, 11 (2017), 2775--2788.

[109]

Yue Yu, Jie Chen, Tian Gao, and Mo Yu. 2019. DAG-GNN: DAG structure learning with graph neural networks. In Proceedings of the International Conference on Machine Learning (ICML’19). 7154--7163.

[110]

Yiteng Zhai, Yewsoon Ong, and Ivor W. Tsang. 2014. The emerging big dimensionality. IEEE Comput. Intell. Mag. 9, 3 (2014), 14--26.

Digital Library

[111]

Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2012. Kernel-based conditional independence test and application in causal discovery. Arxiv Preprint:1202.3775 (2012).

[112]

Kun Zhang, Bernhard Schölkopf, Peter Spirtes, and Clark Glymour. 2017. Learning causality and causality-related learning: Some recent progress. Nat. Sci. Rev. 5, 1 (2017), 26--29.

[113]

Muhan Zhang, Shali Jiang, Zhicheng Cui, Roman Garnett, and Yixin Chen. 2019. D-VAE: A variational autoencoder for directed acyclic graphs. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’19). 1586--1598.

Cited By

Ling ZWu JZhang YZhou PWu XYu KWu X(2025)Label-Aware Causal Feature SelectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.352258037:3(1268-1281)Online publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1109/TKDE.2024.3522580
Zhang YXiong YSun YJin YShan CLu TSong HSun S(2025)CauseRuDi: Explaining Behavior Sequence Models by Causal Statistics Generation and Rule DistillationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348762537:1(116-129)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TKDE.2024.3487625
Wu SYin CWang YSun H(2025)Identifying cancer prognosis genes through causal learningBriefings in Bioinformatics10.1093/bib/bbae72126:1Online publication date: 14-Jan-2025
https://doi.org/10.1093/bib/bbae721
Show More Cited By

Recommendations

Correlation based feature selection method

Feature selection is an important data preprocessing step which is performed before a learning algorithm is applied. The issue that has to be taken into consideration when proposing a feature selection method is its computational complexity. Often, if ...
Genetic algorithms in feature and instance selection

Feature selection and instance selection are two important data preprocessing steps in data mining, where the former is aimed at removing some irrelevant and/or redundant features from a given dataset and the latter at discarding the faulty data. ...
Synthetic Data for Feature Selection
Artificial Intelligence and Soft Computing
Abstract
Feature selection is an important and active field of research in machine learning and data science. Our goal in this paper is to propose a collection of synthetic datasets that can be used as a common reference point for feature selection ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 53, Issue 5

September 2021

782 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3426973

Editor:
Albert Zomaya
University of Sydney, Austraila

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2020

Accepted: 01 June 2020

Revised: 01 June 2020

Received: 01 November 2019

Published in CSUR Volume 53, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Research
Refereed

Funding Sources

National Science Foundation of China
National Key Research and Development Program of China
Australian Research Council Discovery Projects

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

122
Total Citations
View Citations
2,636
Total Downloads

Downloads (Last 12 months)497
Downloads (Last 6 weeks)35

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ling ZWu JZhang YZhou PWu XYu KWu X(2025)Label-Aware Causal Feature SelectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.352258037:3(1268-1281)Online publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1109/TKDE.2024.3522580
Zhang YXiong YSun YJin YShan CLu TSong HSun S(2025)CauseRuDi: Explaining Behavior Sequence Models by Causal Statistics Generation and Rule DistillationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348762537:1(116-129)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TKDE.2024.3487625
Wu SYin CWang YSun H(2025)Identifying cancer prognosis genes through causal learningBriefings in Bioinformatics10.1093/bib/bbae72126:1Online publication date: 14-Jan-2025
https://doi.org/10.1093/bib/bbae721
Liu WQi YLiu F(2025)Reliable prediction for TBM energy consumption during tunnel excavation: A novel technique balancing explainability and performanceUnderground Space10.1016/j.undsp.2024.09.00422(77-95)Online publication date: Jun-2025
https://doi.org/10.1016/j.undsp.2024.09.004
Preyanka Lakshme RGanesh Kumar S(2025)Feature selection using binary horse herd optimization algorithm with lightGBA ensemble classification in microarray dataKnowledge-Based Systems10.1016/j.knosys.2025.113168312(113168)Online publication date: Mar-2025
https://doi.org/10.1016/j.knosys.2025.113168
Yu KRong CWang HCao FLiang J(2025)Federated local causal structure learningScience China Information Sciences10.1007/s11432-023-4203-668:3Online publication date: 16-Jan-2025
https://doi.org/10.1007/s11432-023-4203-6
Feng CBaptista MLiu XNi CGrebogi C(2025)Causal feature selection for health state identification of complex experimental systemsNonlinear Dynamics10.1007/s11071-025-10914-wOnline publication date: 21-Feb-2025
https://doi.org/10.1007/s11071-025-10914-w
Zhang XCai YXiong H(2025)Knoop: practical enhancement of knockoff with over-parameterization for variable selectionMachine Language10.1007/s10994-024-06692-y114:1Online publication date: 17-Jan-2025
https://dl.acm.org/doi/10.1007/s10994-024-06692-y
Bernardini GLiu CLoukides GMarchetti-Spaccamela APissis SStougie LSweering M(2025)Missing value replacement in strings and applicationsData Mining and Knowledge Discovery10.1007/s10618-024-01074-339:2Online publication date: 22-Jan-2025
https://doi.org/10.1007/s10618-024-01074-3
Pros RVitrià J(2025)Exploiting Causal Knowledge During CATE Estimation Using Tree Based MetalearnersMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74640-6_19(261-276)Online publication date: 1-Jan-2025
https://doi.org/10.1007/978-3-031-74640-6_19
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents