Skip to main content
Log in

A meta-heuristic feature selection algorithm combining random sampling accelerator and ensemble using data perturbation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Meta-heuristic algorithms have been extensively utilized in feature selection tasks because they can obtain the global optimal solution. However, the meta-heuristic algorithm will take too much time in the face of a large number of samples. Although most of the studies compromise to approximate optimal solutions for avoiding time-consuming problems, a new problem with reduced classification performance, especially classification stability, is then generated. Aiming to above problems, this paper proposes a new feature selection framework. First, this framework exploits a voting ensemble strategy to improve classification stability by reducing the impact of misclassified labels on the overall classification results. Second, the framework uses a data perturbation strategy to enhance classification accuracy. In particular, the data perturbation strategy is able to generate more neighborhood relationships in the dataset, which could reveal the distribution of various features of the samples. A voting ensemble of different feature distributions is capable of extracting more information from the dataset, then the initially misclassified samples are more likely to be returned to the correct classification. Third, the framework takes a random sampling accelerator into account to solve the problem of excessive time consumption by reducing the size of the search sample space. Finally, for the sake of verifying the effectiveness of the proposed framework, four meta-heuristic feature selection methods based on a neighborhood rough set are compared on 20 datasets. The experimental results indicate that our framework could improve classification performance and accelerate feature selection, particularly in confronting large sample sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Fig. 2

Similar content being viewed by others

Data availability

The authors do not have permission to share data.

References

  1. Ding WP, Nayak J, Naik B, Pelusi D, Mishara M (2021) Fuzzy and real-coded chemical reaction optimization for intrusion detection in industrial big data environment. IEEE Trans Ind Inf 17(6):4298–4307

    Google Scholar 

  2. Dong LJ, Wang RH, Chen DG (2023) Incremental feature selection with fuzzy rough sets for dynamic data sets. Fuzzy Sets Syst 467:108503

    MathSciNet  Google Scholar 

  3. Zhang X, Mei CL, Li JH, Yang YY, Qian T (2023) Instance and feature selection using fuzzy rough sets: a bi-selection approach for data reduction. IEEE Trans Fuzzy Syst 31(6):1981–1994

    Google Scholar 

  4. Chen JK, Lin YJ, Mi JS, Li SZ, Ding WP (2022) A spectral feature selection approach with kernelized fuzzy rough sets. IEEE Trans Fuzzy Syst 30(8):2886–2901

    Google Scholar 

  5. Liu KY, Li TR, Yang XB, Yang X, Liu D (2022) Neighborhood rough set based ensemble feature selection with cross-class sample granulation. Appl Soft Comput 131:109747

    Google Scholar 

  6. Ismail A, Sandell M (2022) A low-complexity endurance modulation for flash memory. IEEE Trans Circ Syst II: Express Briefs 69(2):424–428

    Google Scholar 

  7. Tang YJ, Zhang XM (2022) Low-complexity resource-shareable parallel generalized integrated interleaved encoder. IEEE Trans Circuits Syst I Regul Pap 69(2):694–706

    Google Scholar 

  8. Li ZJ, Kamnitsas K, Glocker B (2021) Analyzing overfitting under class imbalance in neural networks for image segmentation. IEEE Trans Med Imaging 40(3):1065–1077

    Google Scholar 

  9. Park YB, Ho JC (2021) Tackling overfitting in boosting for noisy healthcare data. IEEE Trans Knowl Data Eng 33(7):2995–3006

  10. Baisantry M, Sao AK, Shukla DP (2022) Discriminative spectral spatial feature extraction-based band selection for hyper spectral image classification. IEEE Trans Geosci Remote Sens 60:1–14

    Google Scholar 

  11. Ding WP, Triguero I, Lin CT (2021) Coevolutionary fuzzy at tribute order reduction with complete attribute-value space tree. IEEE Trans Emerging Top Comput Intell 5(1):29–41

    Google Scholar 

  12. Momeni N, Valdés AA, Rodrigues J, Sandi C, Atienza D (2022) CAFS: Cost-Aware Features Selection Method for Multimodal Stress Monitoring on Wearable Devices. IEEE Trans Biomed Eng 69(3):1072–1084

    Google Scholar 

  13. Yan WW, Ba J, Xu TH, Yu HL, Shi JL, Han B (2022) Beam-Influenced Attribute Selector for Producing Stable Reduct. Mathematics 10(4):553

    Google Scholar 

  14. Wei W, Wu XY, Liang JY, Cui JB, Sun YJ (2018) Discernibility matrix based incremental attribute reduction for dynamic data. Knowl Based Syst 140:142–157

    Google Scholar 

  15. Wei W, Cui JB, Liang JY, Wang JH (2016) Fuzzy rough approximations for set-valued data. Inf Sci 360:181–201

    Google Scholar 

  16. Etesami O, Haemers W (2020) On NP-hard graph properties characterized by the spectrum. Discret Appl Math 285:526–529

    MathSciNet  Google Scholar 

  17. Zhang A, Chen Y, Chen L, Chen GT (2018) On the NP-hardness of scheduling with time restrictions. Discret Optim 28:54–62

    MathSciNet  Google Scholar 

  18. Guha R, Ghosh KK, Bera SK, Sarkar R, Mirjalili S (2023) Discrete equilibrium optimizer combined with simulated annealing for feature selection. J Comput Sci 67:1877–7503

    Google Scholar 

  19. Elaziz MA, Ouadfel S, El-Latif AAA, Ali Ibrahim R (2022) Feature Selection Based on Modified Bio-inspired Atomic Orbital Search Using Arithmetic Optimization and Opposite-Based Learning. Cognit Comput 14(6):2274–2295

    Google Scholar 

  20. Penmatsa RKV, Kalidindi A, Mallidi SKR (2020) Feature reduction and optimization of malware detection system using ant colony optimization and rough sets. Int J Inf Secur Priv 14(3):95–114

    Google Scholar 

  21. Luan XY, Li ZP, Liu TZ (2016) A novel attribute reduction algorithm based on rough set and improved artificial fish swarm algorithm. Neurocomputing 174:522–529

    Google Scholar 

  22. Wang GG, Deb S, Cui ZH (2019) Monarch butterfly optimization. Neural Comput Appl 31(7):1995–2014

    Google Scholar 

  23. Shreem SS, Turabieh H, Azwari SA, Baothman F (2022) Enhanced binary genetic algorithm as a feature selection to predict student performance. Soft Comput 26(4):1811–1823

    Google Scholar 

  24. Ghaemi M, Feizi-Derakhshi M-R (2016) Feature selection using Forest Optimization Algorithm. Pattern Recognit 60:121–129

    Google Scholar 

  25. Campagner A, Ciucci D, Hüllermeier E (2021) Rough set-based feature selection for weakly labeled data. Int J Approx Reason 136:150–167

    MathSciNet  Google Scholar 

  26. Pawlak Z (2002) Rough sets and intelligent data analysis. Inf Sci 147(1–4):1–12

    MathSciNet  Google Scholar 

  27. Tawhid MA, Ibrahim AM (2020) Feature selection based on rough set approach, wrapper approach, and binary whale optimization algorithm. Int J Mach Learn Cybern 11(3):573–602

    Google Scholar 

  28. Xu TH, Wang GY, Yang J (2020) Finding strongly connected components of simple digraphs based on granulation strategy. Int J Approx Reason 118:64–78

    MathSciNet  Google Scholar 

  29. Fujita H, Gaeta A, Loia V, Orciuoli F (2020) Hypotheses analysis and assessment in counterterrorism activities: a method based on OWA and fuzzy probabilistic rough sets. IEEE Trans Fuzzy Syst 28(5):831–845

    Google Scholar 

  30. Zhang C, Li DY, Liang JY (2020) Multi-granularity three-way decisions with adjustable hesitant fuzzy linguistic multigranulation decision-theoretic rough sets over two universes. Inf Sci 507:665–683

    MathSciNet  Google Scholar 

  31. Qian J, Han X, Yu Y, Liu CH (2023) Multi-granularity decision-theoretic rough sets based on the fuzzy T-equivalence relation with new strategies. J Intell Fuzzy Syst 44(4):5617–5631

    Google Scholar 

  32. Yang XB, Liang SC, Yu HL (2019) Pseudo-label neighborhood rough set: Measures and attribute reductions. Int J Approx Reason 105:112–129

    MathSciNet  Google Scholar 

  33. Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594

    MathSciNet  Google Scholar 

  34. Zhang K, Zhan JM, Wu WZ (2021) On multi-criteria decision-making method based on a fuzzy rough set model with fuzzy \(\alpha \)-neighborhoods. IEEE Trans Fuzzy Syst 29(9):2491–2505

    Google Scholar 

  35. Yao YY (1998) Relational interpretations of neighborhood operators and rough set approximation operators. Inf Sci 111:239–259

    MathSciNet  Google Scholar 

  36. An S, Guo XY, Wang CZ, Guo G, Dai JH (2023) A soft neighborhood rough set model and its applications. Inf Sci 624:185–199

    Google Scholar 

  37. Yang L, Qin KY, Sang BB, Xu WH (2021) Dynamic fuzzy neighborhood rough set approach for interval-valued information systems with fuzzy decision. Appl Soft Comput 111:107679

    Google Scholar 

  38. Zou L, Li HX, Jiang W, Yang XH (2019) An Improved Fish Swarm Algorithm for Neighborhood Rough Set Reduction and its Application. IEEE Access 7:90277–90288

    Google Scholar 

  39. Feng JD, Gong ZT (2022) A Novel Feature Selection Method With Neighborhood Rough Set and Improved Particle Swarm Optimization. IEEE Access 10:33301–33312

    Google Scholar 

  40. Sahlol AT, Elaziz MA, Al-Qaness MAA, Kim S (2020) Handwritten Arabic Optical Character Recognition Approach Based on Hybrid Whale Optimization Algorithm With Neighborhood Rough Set. IEEE Access 8:23011–23021

    Google Scholar 

  41. Zhang YD, Mao ZD, Li JT, Tian Q (2014) Salient region detection for complex background images using integrated features. Inf Sci 281:586–600

    Google Scholar 

  42. Kanna PR, Santhi P (2021) Unified Deep Learning approach for Efficient Intrusion Detection System using Integrated Spatial-Temporal Features. Knowl Based Syst 226:107132

    Google Scholar 

  43. Gong ZC, Liu YX, Xu TH, Wang PX, Yang XB (2022) Unsupervised attribute reduction: improving effectiveness and efficiency. Int J Mach Learn Cybern 13(11):3645–3662

    Google Scholar 

  44. Yang XB, Yao YY (2018) Ensemble selector for attribute reduction. Appl Soft Comput 70:1–11

    Google Scholar 

  45. Li DC, Liu CW (2012) Extending attribute information for small data set classification. IEEE Trans Knowl Data Eng 24(3):452–464

    Google Scholar 

  46. Wang C, She Z, Cao LB (2013) Coupled attribute analysis on numerical data. In: International Joint Conference on Artificial Intelligence (IJCAI 2013), OPUS, pp 1736–1742

  47. Chen Z, Liu KY, Yang XB, Fujita H (2022) Random sampling accelerator for attribute reduction. Int J Approx Reason 140:75–91

    MathSciNet  Google Scholar 

  48. Chen Q, Xu TH, Chen JJ (2022) Attribute Reduction Based on Lift and Random Sampling. Symmetry 14(9):1828

  49. Chen HM, Li TR, Fan X, Luo C (2019) Feature selection for imbalanced data based on neighborhood rough sets. Inf Sci 483:1–20

    Google Scholar 

  50. Chen Y, Wang PX, Yang XB, Mi JS, Liu D (2021) Granular ball guided selector for attribute reduction. Knowl Based Syst 229:107326

    Google Scholar 

  51. Jia XY, Rao Y, Shang L, Li TJ (2020) Similarity-based attribute reduction in rough set theory: A clustering perspective. Int J Mach Learn Cybern 11(5):1047–1060

    Google Scholar 

  52. Hu QH, Zhang L, Chen DG, Pedrycz W, Yu DR (2010) Gaussian kernel based fuzzy rough sets: Model uncertainty measures and applications. Int J Approx Reason 51(4):453–471

    Google Scholar 

  53. Hu QH, Yu DR, Xie ZX (2008) Neighborhood classifiers. Expert Syst Appl 34(2):866–876

    Google Scholar 

  54. Hu QH, Pedrycz W, Yu DR, Lang J (2009) Selecting Discrete and Continuous Features Based on Neighborhood Decision Error Minimization. IEEE Trans Syst Man Cybern B 40(1):137–150

    Google Scholar 

  55. Li WT, Zhou HX, Xu WH, Wang XZ, Pedrycz W (2022) Interval dominance-based feature selection for interval-valued ordered data. IEEE Trans Neural Netw Learn Syst 1–15. https://doi.org/10.1109/TNNLS.2022.3184120

  56. Li WT, Zhai SC, Xu WH, Pedrycz W, Qian YH, Ding WP, Zhan T (2022) Feature selection approach based on improved Fuzzy C-Means with principle of refined justifiable granularity. IEEE Trans Fuzzy Syst 1–15. https://doi.org/10.1109/TFUZZ.2022.3217377

  57. Rao XS, Yang XB, Yang X, Chen XJ, Liu D, Qian YH (2020) Quickly calculating reduct: an attribute relationship based approach. Knowl Based Syst 200:106014

  58. Liu KY, Yang XB, Fujita H, Liu D, Yang X, Qian YH (2019) An efficient selector for multi-granularity attribute reduction. Inf Sci 505:457–472

    Google Scholar 

  59. Yao YY, Zhang Y, Wang J (2008) On reduct construction algorithms. Trans. Comput. Sci. II 5150:100–117

    Google Scholar 

  60. Chapman-Rounds M, Bhatt U, Pazos E, Schulz M-A, Georgatzis K (2021) FIMAP: Feature Importance by Minimal Adversarial Perturbation. In: Association for the advancement of artificial intelligence (AAAI 2021), pp 11433–11441

  61. Inkawhich N, Wen W, Li H, Chen YR (2019) Feature Space Perturbations Yield More Transferable Adversarial Examples. In: IEEE Conference on computer vision and pattern recognition (CVPR 2019), IEEE, pp 7066–7074

  62. Aksakalli V, Malekipirbazari M (2016) Feature selection via binary simultaneous perturbation stochastic approximation. Pattern Recognit Lett 75:41–47

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant Nos. 62006099, 62076111), and the Key Laboratory of Oceanographic Big Data Mining & Application of Zhejiang Province (No. OBDMA202104).

Author information

Authors and Affiliations

Authors

Contributions

Shuaishuai Zhang: Conceptualization, Methodology, Formal analysis, Investigation, Writing - Original Draft, Writing - Review & Editing, Project administration. Keyu Liu: Resources, Data Curation. Taihua Xu: Investigation, Resources, Data Curation, Writing - Review & Editing. Xibei Yang: Investigation, Resources. Ao Zhang: Data Curation.

Corresponding author

Correspondence to Taihua Xu.

Ethics declarations

Competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, S., Liu, K., Xu, T. et al. A meta-heuristic feature selection algorithm combining random sampling accelerator and ensemble using data perturbation. Appl Intell 53, 29781–29798 (2023). https://doi.org/10.1007/s10489-023-05123-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05123-0

Keywords

Navigation