Abstract
Meta-heuristic algorithms have been extensively utilized in feature selection tasks because they can obtain the global optimal solution. However, the meta-heuristic algorithm will take too much time in the face of a large number of samples. Although most of the studies compromise to approximate optimal solutions for avoiding time-consuming problems, a new problem with reduced classification performance, especially classification stability, is then generated. Aiming to above problems, this paper proposes a new feature selection framework. First, this framework exploits a voting ensemble strategy to improve classification stability by reducing the impact of misclassified labels on the overall classification results. Second, the framework uses a data perturbation strategy to enhance classification accuracy. In particular, the data perturbation strategy is able to generate more neighborhood relationships in the dataset, which could reveal the distribution of various features of the samples. A voting ensemble of different feature distributions is capable of extracting more information from the dataset, then the initially misclassified samples are more likely to be returned to the correct classification. Third, the framework takes a random sampling accelerator into account to solve the problem of excessive time consumption by reducing the size of the search sample space. Finally, for the sake of verifying the effectiveness of the proposed framework, four meta-heuristic feature selection methods based on a neighborhood rough set are compared on 20 datasets. The experimental results indicate that our framework could improve classification performance and accelerate feature selection, particularly in confronting large sample sizes.
Similar content being viewed by others
Data availability
The authors do not have permission to share data.
References
Ding WP, Nayak J, Naik B, Pelusi D, Mishara M (2021) Fuzzy and real-coded chemical reaction optimization for intrusion detection in industrial big data environment. IEEE Trans Ind Inf 17(6):4298–4307
Dong LJ, Wang RH, Chen DG (2023) Incremental feature selection with fuzzy rough sets for dynamic data sets. Fuzzy Sets Syst 467:108503
Zhang X, Mei CL, Li JH, Yang YY, Qian T (2023) Instance and feature selection using fuzzy rough sets: a bi-selection approach for data reduction. IEEE Trans Fuzzy Syst 31(6):1981–1994
Chen JK, Lin YJ, Mi JS, Li SZ, Ding WP (2022) A spectral feature selection approach with kernelized fuzzy rough sets. IEEE Trans Fuzzy Syst 30(8):2886–2901
Liu KY, Li TR, Yang XB, Yang X, Liu D (2022) Neighborhood rough set based ensemble feature selection with cross-class sample granulation. Appl Soft Comput 131:109747
Ismail A, Sandell M (2022) A low-complexity endurance modulation for flash memory. IEEE Trans Circ Syst II: Express Briefs 69(2):424–428
Tang YJ, Zhang XM (2022) Low-complexity resource-shareable parallel generalized integrated interleaved encoder. IEEE Trans Circuits Syst I Regul Pap 69(2):694–706
Li ZJ, Kamnitsas K, Glocker B (2021) Analyzing overfitting under class imbalance in neural networks for image segmentation. IEEE Trans Med Imaging 40(3):1065–1077
Park YB, Ho JC (2021) Tackling overfitting in boosting for noisy healthcare data. IEEE Trans Knowl Data Eng 33(7):2995–3006
Baisantry M, Sao AK, Shukla DP (2022) Discriminative spectral spatial feature extraction-based band selection for hyper spectral image classification. IEEE Trans Geosci Remote Sens 60:1–14
Ding WP, Triguero I, Lin CT (2021) Coevolutionary fuzzy at tribute order reduction with complete attribute-value space tree. IEEE Trans Emerging Top Comput Intell 5(1):29–41
Momeni N, Valdés AA, Rodrigues J, Sandi C, Atienza D (2022) CAFS: Cost-Aware Features Selection Method for Multimodal Stress Monitoring on Wearable Devices. IEEE Trans Biomed Eng 69(3):1072–1084
Yan WW, Ba J, Xu TH, Yu HL, Shi JL, Han B (2022) Beam-Influenced Attribute Selector for Producing Stable Reduct. Mathematics 10(4):553
Wei W, Wu XY, Liang JY, Cui JB, Sun YJ (2018) Discernibility matrix based incremental attribute reduction for dynamic data. Knowl Based Syst 140:142–157
Wei W, Cui JB, Liang JY, Wang JH (2016) Fuzzy rough approximations for set-valued data. Inf Sci 360:181–201
Etesami O, Haemers W (2020) On NP-hard graph properties characterized by the spectrum. Discret Appl Math 285:526–529
Zhang A, Chen Y, Chen L, Chen GT (2018) On the NP-hardness of scheduling with time restrictions. Discret Optim 28:54–62
Guha R, Ghosh KK, Bera SK, Sarkar R, Mirjalili S (2023) Discrete equilibrium optimizer combined with simulated annealing for feature selection. J Comput Sci 67:1877–7503
Elaziz MA, Ouadfel S, El-Latif AAA, Ali Ibrahim R (2022) Feature Selection Based on Modified Bio-inspired Atomic Orbital Search Using Arithmetic Optimization and Opposite-Based Learning. Cognit Comput 14(6):2274–2295
Penmatsa RKV, Kalidindi A, Mallidi SKR (2020) Feature reduction and optimization of malware detection system using ant colony optimization and rough sets. Int J Inf Secur Priv 14(3):95–114
Luan XY, Li ZP, Liu TZ (2016) A novel attribute reduction algorithm based on rough set and improved artificial fish swarm algorithm. Neurocomputing 174:522–529
Wang GG, Deb S, Cui ZH (2019) Monarch butterfly optimization. Neural Comput Appl 31(7):1995–2014
Shreem SS, Turabieh H, Azwari SA, Baothman F (2022) Enhanced binary genetic algorithm as a feature selection to predict student performance. Soft Comput 26(4):1811–1823
Ghaemi M, Feizi-Derakhshi M-R (2016) Feature selection using Forest Optimization Algorithm. Pattern Recognit 60:121–129
Campagner A, Ciucci D, Hüllermeier E (2021) Rough set-based feature selection for weakly labeled data. Int J Approx Reason 136:150–167
Pawlak Z (2002) Rough sets and intelligent data analysis. Inf Sci 147(1–4):1–12
Tawhid MA, Ibrahim AM (2020) Feature selection based on rough set approach, wrapper approach, and binary whale optimization algorithm. Int J Mach Learn Cybern 11(3):573–602
Xu TH, Wang GY, Yang J (2020) Finding strongly connected components of simple digraphs based on granulation strategy. Int J Approx Reason 118:64–78
Fujita H, Gaeta A, Loia V, Orciuoli F (2020) Hypotheses analysis and assessment in counterterrorism activities: a method based on OWA and fuzzy probabilistic rough sets. IEEE Trans Fuzzy Syst 28(5):831–845
Zhang C, Li DY, Liang JY (2020) Multi-granularity three-way decisions with adjustable hesitant fuzzy linguistic multigranulation decision-theoretic rough sets over two universes. Inf Sci 507:665–683
Qian J, Han X, Yu Y, Liu CH (2023) Multi-granularity decision-theoretic rough sets based on the fuzzy T-equivalence relation with new strategies. J Intell Fuzzy Syst 44(4):5617–5631
Yang XB, Liang SC, Yu HL (2019) Pseudo-label neighborhood rough set: Measures and attribute reductions. Int J Approx Reason 105:112–129
Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594
Zhang K, Zhan JM, Wu WZ (2021) On multi-criteria decision-making method based on a fuzzy rough set model with fuzzy \(\alpha \)-neighborhoods. IEEE Trans Fuzzy Syst 29(9):2491–2505
Yao YY (1998) Relational interpretations of neighborhood operators and rough set approximation operators. Inf Sci 111:239–259
An S, Guo XY, Wang CZ, Guo G, Dai JH (2023) A soft neighborhood rough set model and its applications. Inf Sci 624:185–199
Yang L, Qin KY, Sang BB, Xu WH (2021) Dynamic fuzzy neighborhood rough set approach for interval-valued information systems with fuzzy decision. Appl Soft Comput 111:107679
Zou L, Li HX, Jiang W, Yang XH (2019) An Improved Fish Swarm Algorithm for Neighborhood Rough Set Reduction and its Application. IEEE Access 7:90277–90288
Feng JD, Gong ZT (2022) A Novel Feature Selection Method With Neighborhood Rough Set and Improved Particle Swarm Optimization. IEEE Access 10:33301–33312
Sahlol AT, Elaziz MA, Al-Qaness MAA, Kim S (2020) Handwritten Arabic Optical Character Recognition Approach Based on Hybrid Whale Optimization Algorithm With Neighborhood Rough Set. IEEE Access 8:23011–23021
Zhang YD, Mao ZD, Li JT, Tian Q (2014) Salient region detection for complex background images using integrated features. Inf Sci 281:586–600
Kanna PR, Santhi P (2021) Unified Deep Learning approach for Efficient Intrusion Detection System using Integrated Spatial-Temporal Features. Knowl Based Syst 226:107132
Gong ZC, Liu YX, Xu TH, Wang PX, Yang XB (2022) Unsupervised attribute reduction: improving effectiveness and efficiency. Int J Mach Learn Cybern 13(11):3645–3662
Yang XB, Yao YY (2018) Ensemble selector for attribute reduction. Appl Soft Comput 70:1–11
Li DC, Liu CW (2012) Extending attribute information for small data set classification. IEEE Trans Knowl Data Eng 24(3):452–464
Wang C, She Z, Cao LB (2013) Coupled attribute analysis on numerical data. In: International Joint Conference on Artificial Intelligence (IJCAI 2013), OPUS, pp 1736–1742
Chen Z, Liu KY, Yang XB, Fujita H (2022) Random sampling accelerator for attribute reduction. Int J Approx Reason 140:75–91
Chen Q, Xu TH, Chen JJ (2022) Attribute Reduction Based on Lift and Random Sampling. Symmetry 14(9):1828
Chen HM, Li TR, Fan X, Luo C (2019) Feature selection for imbalanced data based on neighborhood rough sets. Inf Sci 483:1–20
Chen Y, Wang PX, Yang XB, Mi JS, Liu D (2021) Granular ball guided selector for attribute reduction. Knowl Based Syst 229:107326
Jia XY, Rao Y, Shang L, Li TJ (2020) Similarity-based attribute reduction in rough set theory: A clustering perspective. Int J Mach Learn Cybern 11(5):1047–1060
Hu QH, Zhang L, Chen DG, Pedrycz W, Yu DR (2010) Gaussian kernel based fuzzy rough sets: Model uncertainty measures and applications. Int J Approx Reason 51(4):453–471
Hu QH, Yu DR, Xie ZX (2008) Neighborhood classifiers. Expert Syst Appl 34(2):866–876
Hu QH, Pedrycz W, Yu DR, Lang J (2009) Selecting Discrete and Continuous Features Based on Neighborhood Decision Error Minimization. IEEE Trans Syst Man Cybern B 40(1):137–150
Li WT, Zhou HX, Xu WH, Wang XZ, Pedrycz W (2022) Interval dominance-based feature selection for interval-valued ordered data. IEEE Trans Neural Netw Learn Syst 1–15. https://doi.org/10.1109/TNNLS.2022.3184120
Li WT, Zhai SC, Xu WH, Pedrycz W, Qian YH, Ding WP, Zhan T (2022) Feature selection approach based on improved Fuzzy C-Means with principle of refined justifiable granularity. IEEE Trans Fuzzy Syst 1–15. https://doi.org/10.1109/TFUZZ.2022.3217377
Rao XS, Yang XB, Yang X, Chen XJ, Liu D, Qian YH (2020) Quickly calculating reduct: an attribute relationship based approach. Knowl Based Syst 200:106014
Liu KY, Yang XB, Fujita H, Liu D, Yang X, Qian YH (2019) An efficient selector for multi-granularity attribute reduction. Inf Sci 505:457–472
Yao YY, Zhang Y, Wang J (2008) On reduct construction algorithms. Trans. Comput. Sci. II 5150:100–117
Chapman-Rounds M, Bhatt U, Pazos E, Schulz M-A, Georgatzis K (2021) FIMAP: Feature Importance by Minimal Adversarial Perturbation. In: Association for the advancement of artificial intelligence (AAAI 2021), pp 11433–11441
Inkawhich N, Wen W, Li H, Chen YR (2019) Feature Space Perturbations Yield More Transferable Adversarial Examples. In: IEEE Conference on computer vision and pattern recognition (CVPR 2019), IEEE, pp 7066–7074
Aksakalli V, Malekipirbazari M (2016) Feature selection via binary simultaneous perturbation stochastic approximation. Pattern Recognit Lett 75:41–47
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Grant Nos. 62006099, 62076111), and the Key Laboratory of Oceanographic Big Data Mining & Application of Zhejiang Province (No. OBDMA202104).
Author information
Authors and Affiliations
Contributions
Shuaishuai Zhang: Conceptualization, Methodology, Formal analysis, Investigation, Writing - Original Draft, Writing - Review & Editing, Project administration. Keyu Liu: Resources, Data Curation. Taihua Xu: Investigation, Resources, Data Curation, Writing - Review & Editing. Xibei Yang: Investigation, Resources. Ao Zhang: Data Curation.
Corresponding author
Ethics declarations
Competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, S., Liu, K., Xu, T. et al. A meta-heuristic feature selection algorithm combining random sampling accelerator and ensemble using data perturbation. Appl Intell 53, 29781–29798 (2023). https://doi.org/10.1007/s10489-023-05123-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05123-0