Skip to main content
Log in

A novel feature selection using Markov blanket representative set and Particle Swarm Optimization algorithm

  • Published:
Computational and Applied Mathematics Aims and scope Submit manuscript

Abstract

Feature selection based on Markov blankets and evolutionary algorithms is a key preprocessing technology of machine learning and data processing. However, in many practical applications, when a data set does not satisfy the condition of fidelity, it may contain multiple Markov blankets of a class attribute. In this paper, a hybrid feature selection algorithm based on Markov blanket representative set via Particle Swarm Optimization is proposed to solve the problem of data classification which does not meet the condition of fidelity. The algorithm uses the maximum information coefficient to determine the correlation and redundancy between features and class attributes and among the features. It redefines the approximate Markov blanket representative set of the class attribute C which does not consider whether the data set satisfies the condition of fidelity. Then obtains the suboptimal feature subset of the original feature set. At the same time, the fitness function which combines the classification prediction ability of the feature subset and the number of selected features is introduced. On the reduced feature set, Particle Swarm Optimization algorithm is used to search for a better feature subset. A series of experiments on benchmark datasets show that the hybrid feature selection algorithm based on Markov blanket and PSO outperforms the Markov blanket-based feature selection selectors and other well-established feature selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  • Aliferis CF, Tsamardinos I, Statnikov AR (2003) HITON: a novel markov blanket algorithm for optimal variable selection[C]. In: AMIA 2003, American medical informatics association annual symposium, Washington, DC, USA, November, 8–12, 2003. http://knowledge.amia.org/amia55142-a2003a-1.616734/t-001-1.619623/f-001-1.619624/a-004-1.620090/a-005-1.620087

  • Andersen SK (1991) Judea pearl, probabilistic reasoning in intelligent systems: networks of plausible inference[J]. Artif Intell 48(1):117–124

    Article  MathSciNet  Google Scholar 

  • Bakhshandeh S, Azmi R, Teshnehlab M (2020) Symmetric uncertainty class-feature association map for feature selection in microarray dataset[J]. Int J Mach Learn Cybern 11(1):15–32

    Article  Google Scholar 

  • Che J, Yang Y, Li L, Bai X, Zhang S, Deng C (2017) Maximum relevance minimum common redundancy feature selection for nonlinear data[J]. Inf Sci 409:68–86

    Article  MATH  Google Scholar 

  • Cheng L, Zheng Chutao W, Zhiwen SY, Hausan W (2020) Multitask Feature Selection by Graph-Clustered Feature Sharing[J]. IEEE Trans Cybern 50(1):74–86

    Article  Google Scholar 

  • Ferreira AJ, Figueiredo MAT (2012) Efficient feature selection filters for high-dimensional data[J]. Pattern Recogn Lett 33(13):1794–1804

    Article  Google Scholar 

  • Gou J, Ma H, Ou W, Zeng S, Rao Y, Yang H (2019) A generalized mean distance-based k-nearest neighbor classifier[J]. Expert Syst Appl 115:356–372

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman JH, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction[M], 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  • Jia J, Yang N, Zhang C, Yue A, Yang J, Zhu D (2013) Object-oriented feature selection of high spatial resolution images using an improved Relief algorithm[J]. Math Comput Model 58(3–4):619–626

    Article  Google Scholar 

  • Keerthi SS, Shevade SK, Bhattacharyya C, Murthy KRK (2001) Improvements to Platt’s SMO algorithm for SVM classifier design[J]. Neural Comput 13(3):637–649

    Article  MATH  Google Scholar 

  • Khaire UM, Dhanalakshmi R (2019) Stability of feature selection algorithm: a review[J]. J King Saud Univ Comput Inf Sci 34:1060–1073

    Google Scholar 

  • Koller D, Sahami M (1996) Toward optimal feature selection[R]. Stanford InfoLab

  • Kumar V, Minz S (2014) Feature selection: a literature review[J]. SmartCR 4(3):211–229

    Article  Google Scholar 

  • Li L, Zhang Y, Chen W, Bose SK, Zukerman M, Shen G (2019) Naïve Bayes classifier-assisted least loaded routing for circuit-switched networks[J]. IEEE Access 7:11854–11867

    Article  Google Scholar 

  • Lianli G, Jingkuan S, Xingyi L, Junming S, Jiajun L, Jie S (2017) Learning in high-dimensional multimedia data: the state of the art[J]. Multimedia Syst 23(3):303–313

    Article  Google Scholar 

  • Liao Y, Vemuri VR (2002) Use of k-nearest neighbor classifier for intrusion detection[J]. Comput Secur 21(5):439–448

    Article  Google Scholar 

  • Lichman M (2007) UCI machine learning repository[Online]. http://archive.ics.uci.edu/ml

  • Liu J, Wang G (2010) A hybrid feature selection method for data sets of thousands of variables[C]. In: 2010 2nd International conference on advanced computer control , vol 2, pp 288–291

  • Nixon M, Aguado A (2019) Feature extraction and image processing for computer vision[M]. Academic Press, New York

    Google Scholar 

  • Pedersen MEH (2010). Good parameters for particle swarm optimization[J]. Hvass Lab., Copenhagen, Denmark, Tech. Rep, HL1001, pp 1551–3203

  • Peña JM, Björkegren J, Tegnér J (2005) Scalable, efficient and correct learning of markov boundaries under the faithfulness assumption. In: Symbolic and quantitative approaches to reasoning with uncertainty, 8th European Conference, ECSQARU 2005, Barcelona, Spain, July 6–8, 2005, Proceedings, pp 136–147. https://doi.org/10.1007/1151865513

  • Peña JM, Nilsson R, Björkegren J, Tegnér J (2007) Towards scalable and data efficient learning of markov boundaries[J]. Int J Approx Reason 45(2):211–232. https://doi.org/10.1016/j.ijar.2006.06.008

    Article  MATH  Google Scholar 

  • Poli R, Kennedy J, Blackwell T (2007) Particle swarm optimization[J]. Swarm Intell 1(1):33–57

    Article  Google Scholar 

  • Rakholia RM, Saini JR (2017) Classification of Gujarati documents using Naïve Bayes classifier[J]. Indian J Sci Technol 10(5):1–9

    Article  Google Scholar 

  • Reshef DN, Reshef YA, Finucane HK, Grossman SR, Gilean MV, Turnbaugh PJ, Lander ES, Michael M, Sabeti PC (2011) Detecting novel associations in large data sets[J]. Science 334(6062):1518

    Article  MATH  Google Scholar 

  • Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ et al (2011) Detecting novel associations in large data sets[J]. Science 334(6062):1518–1524

    Article  MATH  Google Scholar 

  • Rostami M, Berahmand K, Nasiri E, Forouzandeh S (2021) Review of swarm intelligence-based feature selection methods[J]. Eng Appl Artif Intell 100:104210

    Article  Google Scholar 

  • Semwal VB, Singha J, Sharma PK, Chauhan A, Behera B (2017) An optimized feature selection technique based on incremental feature analysis for bio-metric gait data classification[J]. Multimedia Tools Appl 76(22):24457–24475

    Article  Google Scholar 

  • Siying L, Runtong Z, Xiaopu S, Weizi L (2020) Analysis for warning factors of type 2 diabetes mellitus complications with Markov blanket based on a Bayesian network model[J]. Comput Methods Programs Biomed 188:105302

    Article  Google Scholar 

  • Song XF, Zhang Y, Gong DW, Gao XZ (2021) A fast hybrid feature selection based on correlation-guided clustering and particle swarm optimization for high-dimensional data[J]. IEEE Trans Cybern 9:9573–9586

    Google Scholar 

  • Statnikov A, Lytkin NI, Lemeire J, Aliferis CF (2013) Algorithms for discovery of multiple markov boundaries[J]. J Mach Learn Res Jmlr 14(1):499–566

    MathSciNet  MATH  Google Scholar 

  • Sun GL, Li JB, Dai J et al (2018) Feature selection for IoT based on maximal information coefficient[J]. Feature Gen Comput Syst 89:606–616

    Article  Google Scholar 

  • Tharwat A (2019) Parameter investigation of support vector machine classifier with kernel functions[J]. Knowl Inf Syst 61(3):1269–1302

    Article  MathSciNet  Google Scholar 

  • Tsamardinos I, Aliferis CF (2003) Towards principled feature selection: relevancy, filters and wrappers[C]. In: Proceedings of the ninth international workshop on artificial intelligence and statistics, AISTATS 2003, Key West, Florida, USA, January, 3–6, 2003. http://research.microsoft.com/enus/um/cambridge/events/aistats2003/proceedings/133.pdf

  • Tsamardinos I, Aliferis CF, Statnikov AR (2003) Algorithms for large scale markov blanket discovery[C]. In: Proceedings of the sixteenth international Florida artificial intelligence research society conference, May, 12–14, 2003, St. Augustine, Florida, USA, pp 376–381. http://www.aaai.org/Library/FLAIRS/2003/flairs03--073.php

  • Tubishat M, Ja’afar S, Alswaitti M, Mirjalili S, Idris N, Ismail MA, Omar MS (2021) Dynamic salp swarm algorithm for feature selection[J]. Expert Syst Appl 164:113873

    Article  Google Scholar 

  • Venkatesh B, Anuradha J (2019) A review of feature selection and its methods[J]. Cybern Inf Technol 19(1):3–26

    MathSciNet  Google Scholar 

  • Wang Y, Wang J, Liao H, Chen H (2017) Unsupervised feature selection based on Markov blanket and particle swarm optimization[J]. J Syst Eng Electron 28(1):151–161

    Article  Google Scholar 

  • Wang Y, Wang J, Liao H, Chen H (2017) An efficient semi-supervised representatives feature selection algorithm based on information theory[J]. Pattern Recogn 61:511–523

    Article  MATH  Google Scholar 

  • Wang R, Nie F, Hong R, Chang X, Yang X, Yu W (2017) Fast and orthogonal locality preserving projections for dimensionality reduction[J]. IEEE Trans Image Process 26(10):5019–5030

    Article  MathSciNet  MATH  Google Scholar 

  • Wang H, Ling Z, Yu K, Wu X (2020) Towards efficient and effective discovery of Markov blankets for feature selection[J]. Inf Sci 509:227–242

    Article  Google Scholar 

  • Wu X, Jiang B, Yu K, Chen H (2019) Accurate markov boundary discovery for causal feature selection[J]. IEEE Trans Cybern 50(12):4983–4996

    Article  Google Scholar 

  • Xu S, Li Y, Wang Z (2017) Bayesian multinomial Naïve Bayes classifier to text classification[M]. Advanced multimedia and ubiquitous engineering. Springer, Singapore, pp 347–352

    Chapter  Google Scholar 

  • Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary computation approaches to feature selection[J]. IEEE Trans Evol Comput 20(4):606–626

    Article  Google Scholar 

  • Yang Y, Li J, Yang Y (2015) The research of the fast SVM classifier method[C]. In: 2015 12th international computer conference on wavelet active media technology and information processing (ICCWAMTIP), p 121124

  • Yu K, Wu X, Zhang Z, Mu Y, Wang H, Ding W (2013) Markov blanket feature selection with non-faithful data distributions[C]. In: 2013 IEEE 13th International conference on data mining, pp 857–866

  • Yu Z, Chen H, Liu J, You J, Leung H, Han G (2015) Hybrid \( k \)-nearest neighbor classifier[J]. IEEE Trans Cybern 46(6):1263–1275

    Article  Google Scholar 

  • Yu K, Wu X, Ding W, Mu Y, Wang H (2017) Markov blanket feature selection using representative sets[J]. IEEE Trans Neural Netw Learn Syst 28(11):2775–2788

    Article  MathSciNet  Google Scholar 

  • Zhao Z, Morstatter F, Sharma S, Anand A, Liu H (2016) Advancing feature selection research-asu feature selection repository. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.642.5862

  • Zhou H, Wang X, Zhu R (2022) Feature selection based on mutual information with correlation coefficient[J]. Appl Intell 52(5):5457–5474

    Article  Google Scholar 

  • Zhu Z, Ong YS, Dash M (2007) Markov blanket embedded genetic algorithm for gene selection[J]. Pattern Recogn 40(11):3236–3248

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the editor and reviewers for their valuable comments. This work is financially supported by the National Natural Science Foundation of China (61573266) and Natural Science Basic Research Program of Shaanxi (Program no. 2021JM–133)

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Liqin Sun or Youlong Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Communicated by Hector Cancela.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Yang, Y. & Ning, T. A novel feature selection using Markov blanket representative set and Particle Swarm Optimization algorithm. Comp. Appl. Math. 42, 81 (2023). https://doi.org/10.1007/s40314-023-02221-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40314-023-02221-0

Keywords

Mathematics Subject Classification

Navigation