Skip to main content
Log in

Wide-ranging approach-based feature selection for classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Feature selection methods have been issued in the context of data classification due to redundant and irrelevant features. The above features slow the overall system performance, and wrong decisions are more likely to be made with extensive data sets. Several methods have been used to solve the feature selection problem for classification, but most are specific to be used only for a particular data set. Thus, this paper proposes wide-ranging approaches to solve maximum feature selection problems for data sets. The proposed algorithm analytically chooses the optimal feature for classification by utilizing mutual information (MI) and linear correlation coefficients (LCC). It considers linearly and nonlinearly dependent data features for the same. The proposed feature selection algorithm suggests various features used to build a substantial feature subset for classification, effectively reducing irrelevant features. Three different datasets are used to evaluate the performance of the proposed algorithm with classifiers which requires a higher degree of features to have better accuracy and a lower computational cost. We considered probability value (p value <0.05) for feature selection in experiments on different data sets, then the number of features is selected (such as 7, 5, and 6 features from mobile, heart, and diabetes data set, respectively). Various accuracy is considered with different classifiers; for example, classifier Nearest_Neighbors made accuracy such as 0.92225, 0.88333, 0.86250 for mobile, heart, and diabetes data sets, respectively. The proposed model is adequate as per the evaluation of several real-world data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1.
Algorithm 2.
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the first author upon reasonable request.

References

  1. Ahmad S, Mehfuz S, Mebarek-Oudina F, Beg J (2022) RSM analysis based cloud access security broker: a systematic literature review. Cluster Comput 25:3733–3763

  2. Amiri F, RezaeiYousefi M, Lucas C, Shakery A, Yazdani N (2011) Mutual information-based feature selection for intrusion detection systems. J Netw Comput Appl 34(4):1184–1199

    Article  Google Scholar 

  3. Battiti R (Jul. 1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550

    Article  Google Scholar 

  4. Bhuyan HK, Chakraborty C (2022) Explainable machine learning for data extraction across computational social system. In: IEEE Transactions on Computational Social Systems, pp 1–15. https://doi.org/10.1109/TCSS.2022.3164993

  5. Bhuyan HK, Huque MS (2018) Sub-feature selection based classification. In: IEEE Explore, International Conference on Trends in Electronics and Informatics (ICOEI), pp 210–216. https://doi.org/10.1109/ICOEI.2018.8553763

  6. Bhuyan HK, Kamila NK (2014) Privacy preserving Sub-feature Selection based on fuzzy probabilities. Cluster Comput (Springer) 17(4):1383–1399

  7. Bhuyan HK, Kamila NK (2015) Privacy preserving sub-feature selection in distributed data mining. Appl Soft Compu, Elsevier 36:552–569 ISSN: 1568-4946

  8. Bhuyan HK, Ravi VK (2021) Analysis of sub-feature for classification in data mining. In: IEEE Transaction on Engineering Management, pp 1–15. https://doi.org/10.1109/TEM.2021.3098463

  9. Bhuyan HK, Mohanty M, Das SR (2012) Privacy preserving for feature selection in data mining using centralized network. Int J Compu Sci Issues (IJCSI) 9(3):434–440

    Google Scholar 

  10. Bhuyan HK, Raghu Kumar L, Reddy KR (2019) Optimization model for sub-feature selection in data mining. In: 2nd International Conference on Smart Systems and Inventive Technology (ICSSIT 2019). IEEE Explore, pp 1–6. https://doi.org/10.1109/ICSSIT46314.2019.8987780

  11. Bhuyan HK, Kamila NK, Pani SK (2022) Individual privacy in data mining using fuzzy optimization. Engineering Optimization, Taylor & Francis 54(8):1305–1323

  12. Bhuyan HK, Ravi V, Brahma B, Kamila NK (2022) Disease analysis using machine learning approaches in healthcare system. Health Technol, Springer 12(5):987–1005

    Article  Google Scholar 

  13. Bhuyan HK, Ravi V, Yadav MS (2022) Multi-objective optimization-based privacy in data mining. Cluster Comput (Springer):1-13. https://doi.org/10.1007/s10586-022-03667-3

  14. Chen C, Wei J, Peng C, Zhang W, Qin H (2020) Qingdao University, Stony Brook University, improved saliency detection in RGB-D images using two-phase depth estimation and selective deep fusion. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2020.2968250

  15. Chitrakar R, Huang C (2014) Selection of candidate support vectors in incremental SVM for network intrusion detection. Comput Sec 45:231–241

    Article  Google Scholar 

  16. Chow TW, Huang D (Jan. 2005) Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information. IEEE Trans Neural Netw 16(1):213–224

    Article  Google Scholar 

  17. Croft WB, Metzler D, Strohman T (2010) Search engines: information retrieval in practice. Addison-Wesley, Reading, MA, USA

    Google Scholar 

  18. Dahiru T (2008) P – value, a true test of statistical significance? a cautionary note. Annals Ibadan Postgrad Med 6(1)

  19. Dhaminda B, Abeywickrama NB, Mamei M, Zambonelli F (2020) The SOTA approach to engineering collective adaptive systems. Int J Softw Tools Technol Transfer 22:399–415. https://doi.org/10.1007/s10009-020-00554-3

    Article  Google Scholar 

  20. Gakii C, Mireji PO, Rimiru R (2022) Graph based feature selection for reduction of dimensionality in next-generation RNA sequencing datasets, algorithms. MDPI 15(21):1–14

    Google Scholar 

  21. Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG (2016) Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31:337–350

    Article  Google Scholar 

  22. He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Proc Int Conf Neural Inf Process Syst:507–514

  23. Hsu CN, Huang HJ, Dietrich S (2004) The ANNIGMA–wrapper approach to fast feature selection for neural nets. IEEE Trans Syst, Man, Cybern B, Cybern 32(2):207–212

    Google Scholar 

  24. https://archive.ics.uci.edu/ml/datasets.php, 2020.

  25. https://www.kaggle.com/datasets, 2020.

  26. Kamila NK, Jena LD, Bhuyan HK (2016) Pareto-based multi-objective optimization for classification in data mining. Cluster Compu (Springer) 19(4):1723–1745 ISSN: 1386–7857 (print version) ISSN: 1573–7543 (electronic version)

    Article  Google Scholar 

  27. Kraskov A, Stogbauer H, Grassberger P (2004) Estimating € mutual information. Phys Rev E 69(6):066138

    Article  MathSciNet  Google Scholar 

  28. Kwak N, Choi C-H (Jan. 2002) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159

    Article  Google Scholar 

  29. Li L, Weinberg CR, Darden TA, Pedersen LG (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12):1131–1142

    Article  Google Scholar 

  30. Li Z, Yang Y, Liu J, Zhou X, Lu H (2012) Unsupervised feature selection using nonnegative spectral analysis. Proc 26th AAAI Conf Artif Intell:1026–1032

  31. Ma G, Li S, Chen C, Hao A, Qin H (2020) Stage-wise Salient Object Detection in 360° Omnidirectional Image via Object-level Semantical Saliency Ranking. IEEE Trans Vis Comput Graph 26(12):3535–3545. https://doi.org/10.1109/TVCG.2020.3023636

    Article  Google Scholar 

  32. Mao KZ (2004) Feature subset selection for support vector machines through discriminative function pruning analysis. IEEE Trans Syst, Man, Cybern B, Cybern 34(1):60–67

    Article  Google Scholar 

  33. Myat Thet Nyo F Mebarek-Oudina, SSH, Khan NA (2022) Otsu’s thresholding technique for MRI image brain tumor segmentation. Multimed Tools Appl. https://doi.org/10.1007/s11042-022-13215-1

  34. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and minredundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  35. W. H. Press, P. Flannery, S. A. Teukolsky, W. T. Vetterling, et al., Numerical Recipes, Cambridge UP Cambridge etc, 1986.

    MATH  Google Scholar 

  36. Rifkin R, Klautau A (2004) In defense of one-vs-all classification. The J Mach Learn Res 5:101–141

    MathSciNet  MATH  Google Scholar 

  37. Rossi F, Lendasse A, François D, Wertz V, Verleysen M (2006) Mutual information for the selection of relevant variables in spectrometric nonlinear modelling. Chemom Intell Lab Syst 80(2):215–226

    Article  Google Scholar 

  38. Song J, Takakura H, Okabe Y, Eto M, Inoue D, Nakao K (2011) Statistical analysis of honeypot data and building of kyoto 2006+dataset for nids evaluation. Proc 1st Workshop Building Anal Datasets Gathering Exp Ret Sec:29–36

  39. Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell 32:112–123

    Article  Google Scholar 

  40. Tavallaee M, Bagheri E, Lu W, Ghorbani A-A (2009) A detailed analysis of the kdd cup 99 data set. Proc 2nd IEEE Symp Comput Intell Security Defence Appl:1–6

  41. Wan Y, Sun S, Cheng Z (2021) Adaptive similarity embedding for unsupervised multi-view feature selection. IEEE Trans Knowl Data Eng 33(10):3338–3350

    Article  Google Scholar 

  42. Wang R, Bian J, Nie F, Li X (2022) Unsupervised Discriminative Projection for Feature Selection. IEEE Trans Knowl Data Eng 34(2):942–953

    Article  Google Scholar 

  43. Wang G, Chen C, Fan D-P, Hao A, Qin H (2022) Weakly Supervised Visual-Auditory Saliency Detection with Multigranularity Perception. IEEE Trans Pattern Anal Mach Intell:1–18 (published in Early access)

  44. Zaffar M, Hashmani MA, Habib R, Quraishi KS, Irfan M, Alqhtani S, Hamdi M (2022) A hybrid feature selection framework for predicting students performance, computers. Mater Continua 70(1):1893–1920

    Article  Google Scholar 

  45. Zhang Y, Zhang Z, Li S, Qin J, Liu G, Wang M, Yan S (Dec. 2019) Unsupervised nonnegative adaptive feature extraction for data representation. IEEE Trans Knowl Data Eng 31(12):2423–2440

    Article  Google Scholar 

  46. Zhang L, Liu J, Zhang B, Zhang D, Zhu C (2020) Deep cascade model-based face recognition: when deep-layered learning meets small data. IEEE Trans Image Process 29:1016–1029

    Article  MathSciNet  MATH  Google Scholar 

  47. Zhu J, Liu Y, Wen C, Wu X (2022) DGDFS: dependence guided discriminative feature selection for predicting adverse drug-drug interaction. IEEE Trans Knowl Data Eng 34(1):271–285

    Google Scholar 

Download references

Code availability

The code is available from the first author upon reasonable request.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hemanta Kumar Bhuyan.

Ethics declarations

Conflicts of interest/competing interests

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 14 Evaluation results of correlation coefficient on diabetes dataset
Table 15 Evaluation results of correlation coefficient on Heart dataset
Table 16 Evaluation results of correlation coefficient on mobile price dataset

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhuyan, H.K., Saikiran, M., Tripathy, M. et al. Wide-ranging approach-based feature selection for classification. Multimed Tools Appl 82, 23277–23304 (2023). https://doi.org/10.1007/s11042-022-14132-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-14132-z

Keywords

Navigation