Skip to main content
Log in

A novel feature selection method via mining Markov blanket

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Constraint-based relevant feature selection using the Markov blanket (MB) discovery in Bayesian network (BN) has attracted widespread attention in diverse data mining applications. However, several MB discovery methods have been presented to manage low- or high-dimensional data by focusing on either improving computation efficiency or boosting learning accuracy instead of considering both. This paper presents a new constraint-based algorithm for feature selection that considers the improvement and balancing of both computational efficiency and prediction accuracy, called F eature S election via Mining M arkov B lanket (FSMB). The FSMB mines the MB containing parents-children (PC) and spouses (SP) using a forward approach to induce the true positive parents-children (PC) of a given target T. The FSMB removes false-positive PC from the PC set and never considers them again. Concurrently, the FSMB finds SP of a target T through an exhaustive search from the non-parents-children set using the V-structure strategy to differentiate both true-positive PC and SP in the MB set and then use them to remove the false-positive SP. Also, the FSMB removes the non-MB descendants using the updated PC and SP set. Extensive experiments are conducted and validated on benchmark datasets for performance evaluation. The results are compared with existing algorithms, including the Incremental Association Markov Blanket (IAMB), the Max-Min Markov Blanket (MMMB), the HITON-MB, the Simultaneous Markov Blanket (STMB), the Iterative Parents-and-Children-Based Markov Blanket (IPCMB), the Balanced Markov Blanket (BAMB), the Efficient and Effective Markov Blanket (EEMB), and the Markov Blanket discovery by Feature Selection (MBFS). Experimental results show that the FSMB outperforms the existing algorithms with higher accuracy and shorter running time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. MB learning and MB discovery are interchangeable in this article.

  2. iff means if and only if

  3. The null set is equal to {} or \(\varnothing \)

  4. The Fisher’s z-test used for the real-world dataset, which is in continuous form.

  5. https://pages.mtu.edu/~lebrown/supplements/mmhc_paper/mmhc_index.html

  6. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

References

  1. Borboudakis G, Tsamardinos I (2019) Forward-backward selection with early dropping. J Mach Learn Res 20:8:1–8:39

    MathSciNet  MATH  Google Scholar 

  2. Mihaljević B, Bielza C, Larrañaga P (2021) Bayesian networks for interpretable machine learning and optimization. Neurocomputing 456:648–665

    Article  Google Scholar 

  3. You D, Li R, Liang S, Sun M, Ou X, Yuan F, Shen L, Wu X (2021) Online causal feature selection for streaming features. IEEE Transactions on Neural Networks and Learning Systems

  4. Gao M, Aragam B (2021) Efficient Bayesian network structure learning via local Markov boundary search. Advances in Neural Information Processing Systems, 34

  5. Wang H, Ling Z, Yu K, Wu X (2020) Towards efficient and effective discovery of Markov blankets for feature selection. Inf Sci 509:227–242

    Article  Google Scholar 

  6. Yu K, Liu L, Li J, Ding W, Le T (2020) Multi-source causal feature selection. IEEE Trans Pattern Anal Mach Intell 42:2240–2256

  7. Dhal P, Azad C (2021) A comprehensive survey on feature selection in the various fields of machine learning. Appl Intell, 1–39

  8. Yu K, Liu L, Li J (2020) Learning Markov blankets from multiple interventional data sets. IEEE Trans Neural Netw Learn Syst 31:2005–2019

    Article  MathSciNet  Google Scholar 

  9. Khaire U M, Dhanalakshmi R (2019) Stability of feature selection algorithm: a review. Journal of King Saud University-Computer and Information Sciences

  10. Wu X, Jiang B, Yu K, Miao C, Chen H (2020) Accurate Markov boundary discovery for causal feature selection. IEEE Trans Cybern 50:4983–4996

    Article  Google Scholar 

  11. Lee J, Jeong J-Y, Jun C-H (2020) Markov blanket-based universal feature selection for classification and regression of mixed-type data. Expert Syst Appl 158:113398

    Article  Google Scholar 

  12. Yang S, Wang H, Yu K, Cao F, Wu X (2021) Towards efficient local causal structure learning. IEEE Transactions on Big Data

  13. Manikandan G, Abirami S (2021) An efficient feature selection framework based on information theory for high dimensional data. Appl Soft Comput, 107729

  14. Ling Z, Yu K, Wang H, Liu L, Ding W, Wu X (2019) Bamb: a balanced Markov blanket discovery approach to feature selection. ACM Trans Intell Syst Technol 10:52:1–52:25

    Article  Google Scholar 

  15. Ling Z, Yu K, Wang H, Li L, Wu X (2020) Using feature selection for local causal structure learning. IEEE Trans Emerg Topics Comput Intell 5(4):530–540

    Article  Google Scholar 

  16. Yu K, Liu L, Li J (2019) Learning Markov blankets from multiple interventional data sets. IEEE Trans Neural Netw Learn Syst 31(6):2005–2019

    Article  MathSciNet  Google Scholar 

  17. Wu X, Jiang B, Yu K, Chen H et al (2019) Accurate Markov boundary discovery for causal feature selection. IEEE Trans Cybern 50(12):4983–4996

    Article  Google Scholar 

  18. Scutari M, Graafland C E, Gutiérrez J M (2019) Who learns better Bayesian network structures: accuracy and speed of structure learning algorithms. Int J Approx Reason 115:235–253

    Article  MathSciNet  MATH  Google Scholar 

  19. Ling Z, Yu K, Wang H, Li L, Wu X (2021) Using feature selection for local causal structure learning. IEEE Trans Emerg Topics Comput Intell 5:530–540

    Article  Google Scholar 

  20. Tsamardinos I, Borboudakis G, Katsogridakis P, Pratikakis P, Christophides V (2018) A greedy feature selection algorithm for big data of high dimensionality. Mach Learn 108:149–202

    Article  MathSciNet  MATH  Google Scholar 

  21. Yu K, Guo X, Liu L, Li J, Wang H, Ling Z, Wu X (2020) Causality-based feature selection: methods and evaluations. ACM Comput Surv (CSUR) 53(5):1–36

    Article  Google Scholar 

  22. Qi X, Fan X, Wang H, Lin L, Gao Y (2021) Mutual-information-inspired heuristics for constraint-based causal structure learning. Inf Sci 560:152–167

    Article  MathSciNet  MATH  Google Scholar 

  23. Zhao J, Ho S-S (2019) Improving Bayesian network local structure learning via data-driven symmetry correction methods. Int J Approx Reason 107:101–121

    Article  MathSciNet  MATH  Google Scholar 

  24. Xu R, Liu S, Zhang Q, Yang Z, Liu J (2022) Pewobs: an efficient Bayesian network learning approach based on permutation and extensible ordering-based search. Futur Gener Comput Syst 128:505–520

    Article  Google Scholar 

  25. Salmi A, Hammouche K, Macaire L (2020) Similarity-based constraint score for feature selection. Knowl-Based Syst 209: 106429

    Article  Google Scholar 

  26. Borboudakis G, Tsamardinos I (2021) Extending greedy feature selection algorithms to multiple solutions. Data Min Knowl Disc 35(4):1393–1434

    Article  MathSciNet  MATH  Google Scholar 

  27. Yu K, Liu L, Li J (2021) A unified view of causal and non-causal feature selection. ACM Trans Knowl Discov Data (TKDD) 15(4):1–46

    Article  Google Scholar 

  28. Fang Z, Liu Y, Geng Z, Zhu S, He Y (2022) A local method for identifying causal relations under Markov equivalence. Artif Intell 305:103669

    Article  MathSciNet  MATH  Google Scholar 

  29. Zhou P, Wang N, Zhao S (2021) Online group streaming feature selection considering feature interaction. Knowl-Based Syst 226:107157

    Article  Google Scholar 

  30. Guo X, Yu K, Cao F, Li P, Wang H (2022) Error-aware Markov blanket learning for causal feature selection. Inf Sci

  31. Solorio-Fernández S, Carrasco-Ochoa J A, Martínez-Trinidad J F (2021) A survey on feature selection methods for mixed data. Artif Intell Rev, 1–26

  32. Guo R, Cheng L, Li J, Hahn P R, Liu H (2020) A survey of learning causality with data: problems and methods. ACM Comput Surv (CSUR) 53(4):1–37

    Google Scholar 

  33. Xue X, Yao M, Wu Z (2018) A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm. Knowl Inf Syst 57(2):389–412

    Article  Google Scholar 

  34. Wu D, He Y, Luo X, Zhou M (2021) A latent factor analysis-based approach to online sparse streaming feature selection. IEEE Transactions on Systems, Man, and Cybernetics: Systems

  35. Alnuaimi N, Masud M M, Serhani M A, Zaki N (2020) Streaming feature selection algorithms for big data: a survey. Applied Computing and Informatics

  36. He Y, Wu B, Wu D, Beyazit E, Chen S, Wu X (2021) Toward mining capricious data streams: a generative approach. IEEE Trans Neural Netw Learn Syst 32:1228–1240

    Article  MathSciNet  Google Scholar 

  37. You D, Wang Y, Xiao J, Lin Y, Pan M, Chen Z, Shen L, Wu X (2021) Online multi-label streaming feature selection with label correlation. IEEE Trans Knowl Data Eng, 1–1. https://doi.org/10.1109/TKDE.2021.3113514

  38. You D, Sun M, Liang S, Li R, Wang Y, Xiao J, Yuan F, Shen L, Wu X (2022) Online feature selection for multi-source streaming features. Inf Sci

  39. BenSaid F, Alimi A M (2021) Online feature selection system for big data classification based on multi-objective automated negotiation. Pattern Recogn 110:107629

    Article  Google Scholar 

  40. Ucar M K, Nour M, Sindi H F, Polat K (2020) The effect of training and testing process on machine learning in biomedical datasets. Math Probl Eng 2020:1–17

    Article  Google Scholar 

  41. Wu D, Luo X, Shang M, He Y, Wang G, Wu X (2020) A data-characteristic-aware latent factor model for web services qos prediction. IEEE Trans Knowl Data Eng

  42. Yu H, Sun X, Wang J (2019) Ensemble os-elm based on combination weight for data stream classification. Appl Intell 49(6):2382–2390

    Article  Google Scholar 

Download references

Acknowledgements

We would like to Thanks Yanshan University for accompanying us in this work.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, W.K, K.L, and Brekhna; methodology, software, formal analysis, validation, data curation, Writing-original draft preparation, W.K, K.L.; investigation,resources, supervision, project administration, K.L.; writing—review and editing, visualization, W.K., S.N., Brekhna. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Lingfu Kong.

Ethics declarations

Conflict of Interests

The authors declare no conflict of interest.

Additional information

Availability of data and materials

The datasets analyzed in this article are available in the benchmark Bayesian network and UCI repository, [https://archive.ics.uci.edu/ml/index.php, https://pages.mtu.edu/~lebrown/supplements/mmhc_paper/mmhc_index.html]. Also, the datasets are available from the First and Corresponding authors’ on request.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, W., Kong, L., Noman, S.M. et al. A novel feature selection method via mining Markov blanket. Appl Intell 53, 8232–8255 (2023). https://doi.org/10.1007/s10489-022-03863-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03863-z

Keywords

Navigation