Abstract
Feature selection based on Markov blankets and evolutionary algorithms is a key preprocessing technology of machine learning and data processing. However, in many practical applications, when a data set does not satisfy the condition of fidelity, it may contain multiple Markov blankets of a class attribute. In this paper, a hybrid feature selection algorithm based on Markov blanket representative set via Particle Swarm Optimization is proposed to solve the problem of data classification which does not meet the condition of fidelity. The algorithm uses the maximum information coefficient to determine the correlation and redundancy between features and class attributes and among the features. It redefines the approximate Markov blanket representative set of the class attribute C which does not consider whether the data set satisfies the condition of fidelity. Then obtains the suboptimal feature subset of the original feature set. At the same time, the fitness function which combines the classification prediction ability of the feature subset and the number of selected features is introduced. On the reduced feature set, Particle Swarm Optimization algorithm is used to search for a better feature subset. A series of experiments on benchmark datasets show that the hybrid feature selection algorithm based on Markov blanket and PSO outperforms the Markov blanket-based feature selection selectors and other well-established feature selection methods.
Similar content being viewed by others
References
Aliferis CF, Tsamardinos I, Statnikov AR (2003) HITON: a novel markov blanket algorithm for optimal variable selection[C]. In: AMIA 2003, American medical informatics association annual symposium, Washington, DC, USA, November, 8–12, 2003. http://knowledge.amia.org/amia55142-a2003a-1.616734/t-001-1.619623/f-001-1.619624/a-004-1.620090/a-005-1.620087
Andersen SK (1991) Judea pearl, probabilistic reasoning in intelligent systems: networks of plausible inference[J]. Artif Intell 48(1):117–124
Bakhshandeh S, Azmi R, Teshnehlab M (2020) Symmetric uncertainty class-feature association map for feature selection in microarray dataset[J]. Int J Mach Learn Cybern 11(1):15–32
Che J, Yang Y, Li L, Bai X, Zhang S, Deng C (2017) Maximum relevance minimum common redundancy feature selection for nonlinear data[J]. Inf Sci 409:68–86
Cheng L, Zheng Chutao W, Zhiwen SY, Hausan W (2020) Multitask Feature Selection by Graph-Clustered Feature Sharing[J]. IEEE Trans Cybern 50(1):74–86
Ferreira AJ, Figueiredo MAT (2012) Efficient feature selection filters for high-dimensional data[J]. Pattern Recogn Lett 33(13):1794–1804
Gou J, Ma H, Ou W, Zeng S, Rao Y, Yang H (2019) A generalized mean distance-based k-nearest neighbor classifier[J]. Expert Syst Appl 115:356–372
Hastie T, Tibshirani R, Friedman JH, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction[M], 2nd edn. Springer, New York
Jia J, Yang N, Zhang C, Yue A, Yang J, Zhu D (2013) Object-oriented feature selection of high spatial resolution images using an improved Relief algorithm[J]. Math Comput Model 58(3–4):619–626
Keerthi SS, Shevade SK, Bhattacharyya C, Murthy KRK (2001) Improvements to Platt’s SMO algorithm for SVM classifier design[J]. Neural Comput 13(3):637–649
Khaire UM, Dhanalakshmi R (2019) Stability of feature selection algorithm: a review[J]. J King Saud Univ Comput Inf Sci 34:1060–1073
Koller D, Sahami M (1996) Toward optimal feature selection[R]. Stanford InfoLab
Kumar V, Minz S (2014) Feature selection: a literature review[J]. SmartCR 4(3):211–229
Li L, Zhang Y, Chen W, Bose SK, Zukerman M, Shen G (2019) Naïve Bayes classifier-assisted least loaded routing for circuit-switched networks[J]. IEEE Access 7:11854–11867
Lianli G, Jingkuan S, Xingyi L, Junming S, Jiajun L, Jie S (2017) Learning in high-dimensional multimedia data: the state of the art[J]. Multimedia Syst 23(3):303–313
Liao Y, Vemuri VR (2002) Use of k-nearest neighbor classifier for intrusion detection[J]. Comput Secur 21(5):439–448
Lichman M (2007) UCI machine learning repository[Online]. http://archive.ics.uci.edu/ml
Liu J, Wang G (2010) A hybrid feature selection method for data sets of thousands of variables[C]. In: 2010 2nd International conference on advanced computer control , vol 2, pp 288–291
Nixon M, Aguado A (2019) Feature extraction and image processing for computer vision[M]. Academic Press, New York
Pedersen MEH (2010). Good parameters for particle swarm optimization[J]. Hvass Lab., Copenhagen, Denmark, Tech. Rep, HL1001, pp 1551–3203
Peña JM, Björkegren J, Tegnér J (2005) Scalable, efficient and correct learning of markov boundaries under the faithfulness assumption. In: Symbolic and quantitative approaches to reasoning with uncertainty, 8th European Conference, ECSQARU 2005, Barcelona, Spain, July 6–8, 2005, Proceedings, pp 136–147. https://doi.org/10.1007/1151865513
Peña JM, Nilsson R, Björkegren J, Tegnér J (2007) Towards scalable and data efficient learning of markov boundaries[J]. Int J Approx Reason 45(2):211–232. https://doi.org/10.1016/j.ijar.2006.06.008
Poli R, Kennedy J, Blackwell T (2007) Particle swarm optimization[J]. Swarm Intell 1(1):33–57
Rakholia RM, Saini JR (2017) Classification of Gujarati documents using Naïve Bayes classifier[J]. Indian J Sci Technol 10(5):1–9
Reshef DN, Reshef YA, Finucane HK, Grossman SR, Gilean MV, Turnbaugh PJ, Lander ES, Michael M, Sabeti PC (2011) Detecting novel associations in large data sets[J]. Science 334(6062):1518
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ et al (2011) Detecting novel associations in large data sets[J]. Science 334(6062):1518–1524
Rostami M, Berahmand K, Nasiri E, Forouzandeh S (2021) Review of swarm intelligence-based feature selection methods[J]. Eng Appl Artif Intell 100:104210
Semwal VB, Singha J, Sharma PK, Chauhan A, Behera B (2017) An optimized feature selection technique based on incremental feature analysis for bio-metric gait data classification[J]. Multimedia Tools Appl 76(22):24457–24475
Siying L, Runtong Z, Xiaopu S, Weizi L (2020) Analysis for warning factors of type 2 diabetes mellitus complications with Markov blanket based on a Bayesian network model[J]. Comput Methods Programs Biomed 188:105302
Song XF, Zhang Y, Gong DW, Gao XZ (2021) A fast hybrid feature selection based on correlation-guided clustering and particle swarm optimization for high-dimensional data[J]. IEEE Trans Cybern 9:9573–9586
Statnikov A, Lytkin NI, Lemeire J, Aliferis CF (2013) Algorithms for discovery of multiple markov boundaries[J]. J Mach Learn Res Jmlr 14(1):499–566
Sun GL, Li JB, Dai J et al (2018) Feature selection for IoT based on maximal information coefficient[J]. Feature Gen Comput Syst 89:606–616
Tharwat A (2019) Parameter investigation of support vector machine classifier with kernel functions[J]. Knowl Inf Syst 61(3):1269–1302
Tsamardinos I, Aliferis CF (2003) Towards principled feature selection: relevancy, filters and wrappers[C]. In: Proceedings of the ninth international workshop on artificial intelligence and statistics, AISTATS 2003, Key West, Florida, USA, January, 3–6, 2003. http://research.microsoft.com/enus/um/cambridge/events/aistats2003/proceedings/133.pdf
Tsamardinos I, Aliferis CF, Statnikov AR (2003) Algorithms for large scale markov blanket discovery[C]. In: Proceedings of the sixteenth international Florida artificial intelligence research society conference, May, 12–14, 2003, St. Augustine, Florida, USA, pp 376–381. http://www.aaai.org/Library/FLAIRS/2003/flairs03--073.php
Tubishat M, Ja’afar S, Alswaitti M, Mirjalili S, Idris N, Ismail MA, Omar MS (2021) Dynamic salp swarm algorithm for feature selection[J]. Expert Syst Appl 164:113873
Venkatesh B, Anuradha J (2019) A review of feature selection and its methods[J]. Cybern Inf Technol 19(1):3–26
Wang Y, Wang J, Liao H, Chen H (2017) Unsupervised feature selection based on Markov blanket and particle swarm optimization[J]. J Syst Eng Electron 28(1):151–161
Wang Y, Wang J, Liao H, Chen H (2017) An efficient semi-supervised representatives feature selection algorithm based on information theory[J]. Pattern Recogn 61:511–523
Wang R, Nie F, Hong R, Chang X, Yang X, Yu W (2017) Fast and orthogonal locality preserving projections for dimensionality reduction[J]. IEEE Trans Image Process 26(10):5019–5030
Wang H, Ling Z, Yu K, Wu X (2020) Towards efficient and effective discovery of Markov blankets for feature selection[J]. Inf Sci 509:227–242
Wu X, Jiang B, Yu K, Chen H (2019) Accurate markov boundary discovery for causal feature selection[J]. IEEE Trans Cybern 50(12):4983–4996
Xu S, Li Y, Wang Z (2017) Bayesian multinomial Naïve Bayes classifier to text classification[M]. Advanced multimedia and ubiquitous engineering. Springer, Singapore, pp 347–352
Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary computation approaches to feature selection[J]. IEEE Trans Evol Comput 20(4):606–626
Yang Y, Li J, Yang Y (2015) The research of the fast SVM classifier method[C]. In: 2015 12th international computer conference on wavelet active media technology and information processing (ICCWAMTIP), p 121124
Yu K, Wu X, Zhang Z, Mu Y, Wang H, Ding W (2013) Markov blanket feature selection with non-faithful data distributions[C]. In: 2013 IEEE 13th International conference on data mining, pp 857–866
Yu Z, Chen H, Liu J, You J, Leung H, Han G (2015) Hybrid \( k \)-nearest neighbor classifier[J]. IEEE Trans Cybern 46(6):1263–1275
Yu K, Wu X, Ding W, Mu Y, Wang H (2017) Markov blanket feature selection using representative sets[J]. IEEE Trans Neural Netw Learn Syst 28(11):2775–2788
Zhao Z, Morstatter F, Sharma S, Anand A, Liu H (2016) Advancing feature selection research-asu feature selection repository. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.642.5862
Zhou H, Wang X, Zhu R (2022) Feature selection based on mutual information with correlation coefficient[J]. Appl Intell 52(5):5457–5474
Zhu Z, Ong YS, Dash M (2007) Markov blanket embedded genetic algorithm for gene selection[J]. Pattern Recogn 40(11):3236–3248
Acknowledgements
The authors are grateful to the editor and reviewers for their valuable comments. This work is financially supported by the National Natural Science Foundation of China (61573266) and Natural Science Basic Research Program of Shaanxi (Program no. 2021JM–133)
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Communicated by Hector Cancela.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, L., Yang, Y. & Ning, T. A novel feature selection using Markov blanket representative set and Particle Swarm Optimization algorithm. Comp. Appl. Math. 42, 81 (2023). https://doi.org/10.1007/s40314-023-02221-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40314-023-02221-0
Keywords
- Feature selection
- Maximal information coefficient
- Approximate Markov blanket representative set
- Suboptimal feature subset
- Particle Swarm Optimization