A novel feature selection method via mining Markov blanket

Khan, Waqar; Kong, Lingfu; Noman, Sohail M.; Brekhna, Brekhna

doi:10.1007/s10489-022-03863-z

A novel feature selection method via mining Markov blanket

Published: 30 July 2022

Volume 53, pages 8232–8255, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Waqar Khan¹,
Lingfu Kong¹,
Sohail M. Noman² &
…
Brekhna Brekhna^3,4

658 Accesses
3 Citations
Explore all metrics

Abstract

Constraint-based relevant feature selection using the Markov blanket (MB) discovery in Bayesian network (BN) has attracted widespread attention in diverse data mining applications. However, several MB discovery methods have been presented to manage low- or high-dimensional data by focusing on either improving computation efficiency or boosting learning accuracy instead of considering both. This paper presents a new constraint-based algorithm for feature selection that considers the improvement and balancing of both computational efficiency and prediction accuracy, called F eature S election via Mining M arkov B lanket (FSMB). The FSMB mines the MB containing parents-children (PC) and spouses (SP) using a forward approach to induce the true positive parents-children (PC) of a given target T. The FSMB removes false-positive PC from the PC set and never considers them again. Concurrently, the FSMB finds SP of a target T through an exhaustive search from the non-parents-children set using the V-structure strategy to differentiate both true-positive PC and SP in the MB set and then use them to remove the false-positive SP. Also, the FSMB removes the non-MB descendants using the updated PC and SP set. Extensive experiments are conducted and validated on benchmark datasets for performance evaluation. The results are compared with existing algorithms, including the Incremental Association Markov Blanket (IAMB), the Max-Min Markov Blanket (MMMB), the HITON-MB, the Simultaneous Markov Blanket (STMB), the Iterative Parents-and-Children-Based Markov Blanket (IPCMB), the Balanced Markov Blanket (BAMB), the Efficient and Effective Markov Blanket (EEMB), and the Markov Blanket discovery by Feature Selection (MBFS). Experimental results show that the FSMB outperforms the existing algorithms with higher accuracy and shorter running time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey of data mining

Article 06 February 2020

A comprehensive survey on feature selection in the various fields of machine learning

Article 23 July 2021

An adaptive learning paradigm: event detection through a novel dynamic arithmetic optimization-based ensemble SVM for data stream classification

Article 16 April 2024

Notes

MB learning and MB discovery are interchangeable in this article.
iff means if and only if
The null set is equal to {} or \(\varnothing \)
The Fisher’s z-test used for the real-world dataset, which is in continuous form.
https://pages.mtu.edu/~lebrown/supplements/mmhc_paper/mmhc_index.html
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

References

Borboudakis G, Tsamardinos I (2019) Forward-backward selection with early dropping. J Mach Learn Res 20:8:1–8:39
MathSciNet MATH Google Scholar
Mihaljević B, Bielza C, Larrañaga P (2021) Bayesian networks for interpretable machine learning and optimization. Neurocomputing 456:648–665
Article Google Scholar
You D, Li R, Liang S, Sun M, Ou X, Yuan F, Shen L, Wu X (2021) Online causal feature selection for streaming features. IEEE Transactions on Neural Networks and Learning Systems
Gao M, Aragam B (2021) Efficient Bayesian network structure learning via local Markov boundary search. Advances in Neural Information Processing Systems, 34
Wang H, Ling Z, Yu K, Wu X (2020) Towards efficient and effective discovery of Markov blankets for feature selection. Inf Sci 509:227–242
Article Google Scholar
Yu K, Liu L, Li J, Ding W, Le T (2020) Multi-source causal feature selection. IEEE Trans Pattern Anal Mach Intell 42:2240–2256
Dhal P, Azad C (2021) A comprehensive survey on feature selection in the various fields of machine learning. Appl Intell, 1–39
Yu K, Liu L, Li J (2020) Learning Markov blankets from multiple interventional data sets. IEEE Trans Neural Netw Learn Syst 31:2005–2019
Article MathSciNet Google Scholar
Khaire U M, Dhanalakshmi R (2019) Stability of feature selection algorithm: a review. Journal of King Saud University-Computer and Information Sciences
Wu X, Jiang B, Yu K, Miao C, Chen H (2020) Accurate Markov boundary discovery for causal feature selection. IEEE Trans Cybern 50:4983–4996
Article Google Scholar
Lee J, Jeong J-Y, Jun C-H (2020) Markov blanket-based universal feature selection for classification and regression of mixed-type data. Expert Syst Appl 158:113398
Article Google Scholar
Yang S, Wang H, Yu K, Cao F, Wu X (2021) Towards efficient local causal structure learning. IEEE Transactions on Big Data
Manikandan G, Abirami S (2021) An efficient feature selection framework based on information theory for high dimensional data. Appl Soft Comput, 107729
Ling Z, Yu K, Wang H, Liu L, Ding W, Wu X (2019) Bamb: a balanced Markov blanket discovery approach to feature selection. ACM Trans Intell Syst Technol 10:52:1–52:25
Article Google Scholar
Ling Z, Yu K, Wang H, Li L, Wu X (2020) Using feature selection for local causal structure learning. IEEE Trans Emerg Topics Comput Intell 5(4):530–540
Article Google Scholar
Yu K, Liu L, Li J (2019) Learning Markov blankets from multiple interventional data sets. IEEE Trans Neural Netw Learn Syst 31(6):2005–2019
Article MathSciNet Google Scholar
Wu X, Jiang B, Yu K, Chen H et al (2019) Accurate Markov boundary discovery for causal feature selection. IEEE Trans Cybern 50(12):4983–4996
Article Google Scholar
Scutari M, Graafland C E, Gutiérrez J M (2019) Who learns better Bayesian network structures: accuracy and speed of structure learning algorithms. Int J Approx Reason 115:235–253
Article MathSciNet MATH Google Scholar
Ling Z, Yu K, Wang H, Li L, Wu X (2021) Using feature selection for local causal structure learning. IEEE Trans Emerg Topics Comput Intell 5:530–540
Article Google Scholar
Tsamardinos I, Borboudakis G, Katsogridakis P, Pratikakis P, Christophides V (2018) A greedy feature selection algorithm for big data of high dimensionality. Mach Learn 108:149–202
Article MathSciNet MATH Google Scholar
Yu K, Guo X, Liu L, Li J, Wang H, Ling Z, Wu X (2020) Causality-based feature selection: methods and evaluations. ACM Comput Surv (CSUR) 53(5):1–36
Article Google Scholar
Qi X, Fan X, Wang H, Lin L, Gao Y (2021) Mutual-information-inspired heuristics for constraint-based causal structure learning. Inf Sci 560:152–167
Article MathSciNet MATH Google Scholar
Zhao J, Ho S-S (2019) Improving Bayesian network local structure learning via data-driven symmetry correction methods. Int J Approx Reason 107:101–121
Article MathSciNet MATH Google Scholar
Xu R, Liu S, Zhang Q, Yang Z, Liu J (2022) Pewobs: an efficient Bayesian network learning approach based on permutation and extensible ordering-based search. Futur Gener Comput Syst 128:505–520
Article Google Scholar
Salmi A, Hammouche K, Macaire L (2020) Similarity-based constraint score for feature selection. Knowl-Based Syst 209: 106429
Article Google Scholar
Borboudakis G, Tsamardinos I (2021) Extending greedy feature selection algorithms to multiple solutions. Data Min Knowl Disc 35(4):1393–1434
Article MathSciNet MATH Google Scholar
Yu K, Liu L, Li J (2021) A unified view of causal and non-causal feature selection. ACM Trans Knowl Discov Data (TKDD) 15(4):1–46
Article Google Scholar
Fang Z, Liu Y, Geng Z, Zhu S, He Y (2022) A local method for identifying causal relations under Markov equivalence. Artif Intell 305:103669
Article MathSciNet MATH Google Scholar
Zhou P, Wang N, Zhao S (2021) Online group streaming feature selection considering feature interaction. Knowl-Based Syst 226:107157
Article Google Scholar
Guo X, Yu K, Cao F, Li P, Wang H (2022) Error-aware Markov blanket learning for causal feature selection. Inf Sci
Solorio-Fernández S, Carrasco-Ochoa J A, Martínez-Trinidad J F (2021) A survey on feature selection methods for mixed data. Artif Intell Rev, 1–26
Guo R, Cheng L, Li J, Hahn P R, Liu H (2020) A survey of learning causality with data: problems and methods. ACM Comput Surv (CSUR) 53(4):1–37
Google Scholar
Xue X, Yao M, Wu Z (2018) A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm. Knowl Inf Syst 57(2):389–412
Article Google Scholar
Wu D, He Y, Luo X, Zhou M (2021) A latent factor analysis-based approach to online sparse streaming feature selection. IEEE Transactions on Systems, Man, and Cybernetics: Systems
Alnuaimi N, Masud M M, Serhani M A, Zaki N (2020) Streaming feature selection algorithms for big data: a survey. Applied Computing and Informatics
He Y, Wu B, Wu D, Beyazit E, Chen S, Wu X (2021) Toward mining capricious data streams: a generative approach. IEEE Trans Neural Netw Learn Syst 32:1228–1240
Article MathSciNet Google Scholar
You D, Wang Y, Xiao J, Lin Y, Pan M, Chen Z, Shen L, Wu X (2021) Online multi-label streaming feature selection with label correlation. IEEE Trans Knowl Data Eng, 1–1. https://doi.org/10.1109/TKDE.2021.3113514
You D, Sun M, Liang S, Li R, Wang Y, Xiao J, Yuan F, Shen L, Wu X (2022) Online feature selection for multi-source streaming features. Inf Sci
BenSaid F, Alimi A M (2021) Online feature selection system for big data classification based on multi-objective automated negotiation. Pattern Recogn 110:107629
Article Google Scholar
Ucar M K, Nour M, Sindi H F, Polat K (2020) The effect of training and testing process on machine learning in biomedical datasets. Math Probl Eng 2020:1–17
Article Google Scholar
Wu D, Luo X, Shang M, He Y, Wang G, Wu X (2020) A data-characteristic-aware latent factor model for web services qos prediction. IEEE Trans Knowl Data Eng
Yu H, Sun X, Wang J (2019) Ensemble os-elm based on combination weight for data stream classification. Appl Intell 49(6):2382–2390
Article Google Scholar

Download references

Acknowledgements

We would like to Thanks Yanshan University for accompanying us in this work.

Author information

Authors and Affiliations

School of Information Science and Engineering, Yanshan University, Haigang District, Qinhuangdao, 066004, Hebei, China
Waqar Khan & Lingfu Kong
Department of Cell Biology and Genetics, Shantou University Medical College, Jinping District, Shantou, 515041, Guangdong, China
Sohail M. Noman
School of Computer Science and Technology, Shandong University of Finance and Economics, Lixia District, Jinan, 250000, Shandong, China
Brekhna Brekhna
Department of Computer Science, Shaheed Benazir Bhutto Women University, Larama District, Peshawar, 25000, KPK, Pakistan
Brekhna Brekhna

Authors

Waqar Khan
View author publications
You can also search for this author in PubMed Google Scholar
Lingfu Kong
View author publications
You can also search for this author in PubMed Google Scholar
Sohail M. Noman
View author publications
You can also search for this author in PubMed Google Scholar
Brekhna Brekhna
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, W.K, K.L, and Brekhna; methodology, software, formal analysis, validation, data curation, Writing-original draft preparation, W.K, K.L.; investigation,resources, supervision, project administration, K.L.; writing—review and editing, visualization, W.K., S.N., Brekhna. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Lingfu Kong.

Ethics declarations

Conflict of Interests

The authors declare no conflict of interest.

Additional information

Availability of data and materials

The datasets analyzed in this article are available in the benchmark Bayesian network and UCI repository, [https://archive.ics.uci.edu/ml/index.php, https://pages.mtu.edu/~lebrown/supplements/mmhc_paper/mmhc_index.html]. Also, the datasets are available from the First and Corresponding authors’ on request.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, W., Kong, L., Noman, S.M. et al. A novel feature selection method via mining Markov blanket. Appl Intell 53, 8232–8255 (2023). https://doi.org/10.1007/s10489-022-03863-z

Download citation

Accepted: 07 June 2022
Published: 30 July 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10489-022-03863-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel feature selection method via mining Markov blanket

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of data mining

A comprehensive survey on feature selection in the various fields of machine learning

An adaptive learning paradigm: event detection through a novel dynamic arithmetic optimization-based ensemble SVM for data stream classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Availability of data and materials

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel feature selection method via mining Markov blanket

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of data mining

A comprehensive survey on feature selection in the various fields of machine learning

An adaptive learning paradigm: event detection through a novel dynamic arithmetic optimization-based ensemble SVM for data stream classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Availability of data and materials

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation