research-article

BAMB: A Balanced Markov Blanket Discovery Approach to Feature Selection

Authors:

Xindong WuAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 10, Issue 5

Article No.: 52, Pages 1 - 25

https://doi.org/10.1145/3335676

Published: 16 October 2019 Publication History

Abstract

The discovery of Markov blanket (MB) for feature selection has attracted much attention in recent years, since the MB of the class attribute is the optimal feature subset for feature selection. However, almost all existing MB discovery algorithms focus on either improving computational efficiency or boosting learning accuracy, instead of both. In this article, we propose a novel MB discovery algorithm for balancing efficiency and accuracy, called <underline>BA</underline>lanced <underline>M</underline>arkov <underline>B</underline>lanket (BAMB) discovery. To achieve this goal, given a class attribute of interest, BAMB finds candidate PC (parents and children) and spouses and removes false positives from the candidate MB set in one go. Specifically, once a feature is successfully added to the current PC set, BAMB finds the spouses with regard to this feature, then uses the updated PC and the spouse set to remove false positives from the current MB set. This makes the PC and spouses of the target as small as possible and thus achieves a trade-off between computational efficiency and learning accuracy. In the experiments, we first compare BAMB with 8 state-of-the-art MB discovery algorithms on 7 benchmark Bayesian networks, then we use 10 real-world datasets and compare BAMB with 12 feature selection algorithms, including 8 state-of-the-art MB discovery algorithms and 4 other well-established feature selection methods. On prediction accuracy, BAMB outperforms 12 feature selection algorithms compared. On computational efficiency, BAMB is close to the IAMB algorithm while it is much faster than the remaining seven MB discovery algorithms.

References

[1]

Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos. 2010. Local causal and Markov blanket induction for causal discovery and feature selection for classification part i: Algorithms and empirical evaluation. J. Mach. Learn. Res. 11 (Jan. 2010), 171--234.

[2]

Constantin F. Aliferis, Ioannis Tsamardinos, and Alexander Statnikov. 2003. HITON: A novel Markov blanket algorithm for optimal variable selection. In Proceedings of the AMIA Annual Symposium Proceedings. American Medical Informatics Association, 21.

[3]

Ingo A. Beinlich, Henri Jacques Suermondt, R. Martin Chavez, and Gregory F. Cooper. 1989. The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In Proceedings of the Conference on Artificial Intelligence in Medicine (AIME’89). Springer, 247--256.

[4]

John Binder, Daphne Koller, Stuart Russell, and Keiji Kanazawa. 1997. Adaptive probabilistic networks with hidden variables. Mach. Learn. 29, 2--3 (1997), 213--244.

Digital Library

[5]

Verónica Bolón-Canedo, D. Rego-Fernández, Diego Peteiro-Barral, Amparo Alonso-Betanzos, Bertha Guijarro-Berdiñas, and Noelia Sánchez-Maroño. 2018. On the scalability of feature selection methods on high-dimensional data. Knowl. Info. Syst. 56, 2 (2018), 395--442.

Digital Library

[6]

Ruichu Cai, Zhenjie Zhang, and Zhifeng Hao. 2011. BASSUM: A Bayesian semi-supervised method for classification feature selection. Pattern Recogn. 44, 4 (2011), 811--820.

Digital Library

[7]

Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 27.

Digital Library

[8]

Qiang Cheng, Hongbo Zhou, and Jie Cheng. 2011. The Fisher-Markov selector: Fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 33, 6 (2011), 1217--1233.

Digital Library

[9]

A. P. Dawid, R. G. Cowell, S. L. Lauritzen, and D. J. Spiegelhalter. 1999. Probabilistic Networks and Expert Systems. Springer-Verlag.

[10]

Dua Dheeru and Efi Karra Taniskidou. 2017. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.

[11]

Shunkai Fu and Michel C. Desmarais. 2008. Fast Markov blanket discovery algorithm via local learning within single pass. In Proceedings of the Conference of the Canadian Society for Computational Studies of Intelligence. Springer, 96--107.

[12]

Tian Gao and Qiang Ji. 2017. Efficient Markov blanket discovery and its application. IEEE Trans. Cybernet. 47, 5 (2017), 1169--1179.

[13]

Tian Gao and Qiang Ji. 2017. Efficient score-based Markov blanket discovery. Int. J. Approx. Reason. 80 (2017), 277--293.

Digital Library

[14]

Ben Hitt and Peter Levine. 2006. Multiple high-resolution serum proteomic features for ovarian cancer detection. U.S. Patent App. 11/093,018.

[15]

Daphne Koller and Mehran Sahami. 1996. Toward Optimal Feature Selection. Technical Report. Stanford InfoLab.

[16]

Dimitris Margaritis and Sebastian Thrun. 2000. Bayesian network induction via local neighborhoods. In Advances in Neural Information Processing Systems. MIT Press, 505--511.

[17]

T. Niinimki and Pekka Parviainen. 2012. Local structure discovery in Bayesian networks. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI’12). 634--643.

[18]

Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann series in representation and reasoning.

[19]

Judea Pearl. 2014. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Elsevier.

Digital Library

[20]

Jose M. Pena, Roland Nilsson, Johan Björkegren, and Jesper Tegnér. 2007. Towards scalable and data efficient learning of Markov boundaries. Int. J. Approx. Reason. 45, 2 (2007), 211--232.

Digital Library

[21]

Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 8 (2005), 1226--1238.

Digital Library

[22]

Tomi Silander and Petri Myllymaki. 2006. A simple approach for finding the globally optimal Bayesian network structure. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI’06). 445--452.

[23]

Peter Spirtes, Clark N. Glymour, and Richard Scheines. 2000. Causation, Prediction, and Search. MIT press.

[24]

A. Statnikov, I . Tsamardinos, and C. F. Aliferis. 2003. An algorithm for generation of large Bayesian networks. Department of Biomedical Informatics, Discovery Systems Laboratory, Vanderbilt University, Technical Report DSL-03-01 (2003).

[25]

Ioannis Tsamardinos, Constantin F. Aliferis, and Alexander Statnikov. 2003. Time and sample efficient discovery of Markov blankets and direct causal relations. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 673--678.

Digital Library

[26]

Ioannis Tsamardinos, Constantin F. Aliferis, Alexander R. Statnikov, and Er Statnikov. 2003. Algorithms for large scale Markov blanket discovery. In Proceedings of the International Conference of the Florida Artificial Intelligence Research Society (FLAIRS’03), Vol. 2. 376--380.

[27]

Ioannis Tsamardinos, Laura E. Brown, and Constantin F. Aliferis. 2006. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65, 1 (2006), 31--78.

Digital Library

[28]

Xiaowei Xue, Min Yao, and Zhaohui Wu. 2018. A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm. Knowl. Info. Syst. 57, 2 (2018), 389--412.

Digital Library

[29]

Sandeep Yaramakala and Dimitris Margaritis. 2005. Speculative Markov blanket discovery for optimal feature selection. In Proceedings of the 5th IEEE International Conference on Data Mining. IEEE, 4.

Digital Library

[30]

Kui Yu, Lin Liu, and Jiuyong Li. 2018. A unified view of causal and non-causal feature selection. Arxiv Preprint Arxiv:1802.05844.

[31]

Kui Yu, Lin Liu, Jiuyong Li, Wei Ding, and Thuc Le. 2019. Multi-source causal feature selection. IEEE Trans. Pattern Anal. Mach. Intell.

Digital Library

[32]

Kui Yu, Xindong Wu, Wei Ding, and Jian Pei. 2016. Scalable and accurate online feature selection for big data. ACM Trans. Knowl. Discov. Data 11, 2 (2016), 16.

Digital Library

[33]

Lei Yu and Huan Liu. 2004. Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, (Oct. 2004), 1205--1224.

[34]

Zheng Zhao, Lei Wang, Huan Liu, and Jieping Ye. 2013. On similarity preserving feature selection. IEEE Trans. Knowl. Data Eng. 25, 3 (2013), 619--632.

Digital Library

Cited By

Ling ZWu JZhang YZhou PWu XYu KWu X(2025)Label-Aware Causal Feature SelectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.352258037:3(1268-1281)Online publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1109/TKDE.2024.3522580
Hassan APaik JKhare SHassan S(2025)A wrapper feature selection approach using Markov blanketsPattern Recognition10.1016/j.patcog.2024.111069158(111069)Online publication date: Feb-2025
https://doi.org/10.1016/j.patcog.2024.111069
Hu LZheng Z(2024)Causal Feature Selection Algorithm Based on Maximizing Neighbourhood Mutual InformationProceedings of the 2024 8th International Conference on Computer Science and Artificial Intelligence10.1145/3709026.3709031(482-490)Online publication date: 6-Dec-2024
https://dl.acm.org/doi/10.1145/3709026.3709031
Show More Cited By

Recommendations

Causal Feature Selection with Missing Data
Causal feature selection aims at learning the Markov blanket (MB) of a class variable for feature selection. The MB of a class variable implies the local causal structure among the class variable and its MB and all other features are probabilistically ...
Towards efficient and effective discovery of Markov blankets for feature selection
Abstract
The Markov blanket (MB), a key concept in a Bayesian network (BN), is essential for large-scale BN structure learning and optimal feature selection. Many MB discovery algorithms that are either efficient or effective have been proposed ...
Loose-to-strict Markov blanket learning algorithm for feature selection
Abstract
The Markov blanket (MB) represents a crucial concept in a Bayesian network (BN) and is theoretically the optimal solution to the feature selection problem. Methods based on conditional independence (CI) tests are prevalent for MB discovery. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 10, Issue 5

Special Section on Advances in Causal Discovery and Inference and Regular Papers

September 2019

314 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3360733

Editor:
Yu Zheng
JD Finance, China

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2019

Accepted: 01 May 2019

Revised: 01 April 2019

Received: 01 August 2018

Published in TIST Volume 10, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation of China
Anhui Province Key Research and Development Plan
National Key Research and Development Program of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
469
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)3

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ling ZWu JZhang YZhou PWu XYu KWu X(2025)Label-Aware Causal Feature SelectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.352258037:3(1268-1281)Online publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1109/TKDE.2024.3522580
Hassan APaik JKhare SHassan S(2025)A wrapper feature selection approach using Markov blanketsPattern Recognition10.1016/j.patcog.2024.111069158(111069)Online publication date: Feb-2025
https://doi.org/10.1016/j.patcog.2024.111069
Hu LZheng Z(2024)Causal Feature Selection Algorithm Based on Maximizing Neighbourhood Mutual InformationProceedings of the 2024 8th International Conference on Computer Science and Artificial Intelligence10.1145/3709026.3709031(482-490)Online publication date: 6-Dec-2024
https://dl.acm.org/doi/10.1145/3709026.3709031
Ling ZXu EZhou PDu LYu KWu X(2024)Fair Feature Selection: A Causal PerspectiveACM Transactions on Knowledge Discovery from Data10.1145/364389018:7(1-23)Online publication date: 3-Feb-2024
https://dl.acm.org/doi/10.1145/3643890
Guo XYu KLiu LCao FLi J(2024)Causal Feature Selection With Dual CorrectionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.317807535:1(938-951)Online publication date: Jan-2024
https://doi.org/10.1109/TNNLS.2022.3178075
Xu WZhang HXia YRen YGuan JZhou S(2024)Hybrid Causal Feature Selection for Cancer Biomarker Identification From RNA-Seq DataIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2024.340692221:6(1645-1655)Online publication date: Nov-2024
https://doi.org/10.1109/TCBB.2024.3406922
Chen TDong GZhu SShi PYu JZhou R(2024)Efficient Discovery of Spouses for Causal Feature Selection2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE)10.1109/CISCE62493.2024.10653027(381-387)Online publication date: 10-May-2024
https://doi.org/10.1109/CISCE62493.2024.10653027
Yang JWang ZWang GLiu YHe YWu D(2024)OSFS‐VagueCAAI Transactions on Intelligence Technology10.1049/cit2.123279:6(1451-1466)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1049/cit2.12327
Wang NLiu HZhang LCai YShi Q(2024)Loose-to-strict Markov blanket learning algorithm for feature selectionKnowledge-Based Systems10.1016/j.knosys.2023.111216283:COnline publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1016/j.knosys.2023.111216
Liu HShi QCai YWang NZhang LLiu D(2024)Fast Shrinking parents-children learning for Markov blanket-based feature selectionInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02108-4Online publication date: 7-Mar-2024
https://doi.org/10.1007/s13042-024-02108-4
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents