Wide-ranging approach-based feature selection for classification

Bhuyan, Hemanta Kumar; Saikiran, M; Tripathy, Murchhana; Ravi, Vinayakumar

doi:10.1007/s11042-022-14132-z

Wide-ranging approach-based feature selection for classification

Published: 17 November 2022

Volume 82, pages 23277–23304, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hemanta Kumar Bhuyan ORCID: orcid.org/0000-0002-9712-7280¹,
M Saikiran¹,
Murchhana Tripathy² &
…
Vinayakumar Ravi³

223 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

Feature selection methods have been issued in the context of data classification due to redundant and irrelevant features. The above features slow the overall system performance, and wrong decisions are more likely to be made with extensive data sets. Several methods have been used to solve the feature selection problem for classification, but most are specific to be used only for a particular data set. Thus, this paper proposes wide-ranging approaches to solve maximum feature selection problems for data sets. The proposed algorithm analytically chooses the optimal feature for classification by utilizing mutual information (MI) and linear correlation coefficients (LCC). It considers linearly and nonlinearly dependent data features for the same. The proposed feature selection algorithm suggests various features used to build a substantial feature subset for classification, effectively reducing irrelevant features. Three different datasets are used to evaluate the performance of the proposed algorithm with classifiers which requires a higher degree of features to have better accuracy and a lower computational cost. We considered probability value (p value <0.05) for feature selection in experiments on different data sets, then the number of features is selected (such as 7, 5, and 6 features from mobile, heart, and diabetes data set, respectively). Various accuracy is considered with different classifiers; for example, classifier Nearest_Neighbors made accuracy such as 0.92225, 0.88333, 0.86250 for mobile, heart, and diabetes data sets, respectively. The proposed model is adequate as per the evaluation of several real-world data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature dimensionality reduction: a review

Article Open access 21 January 2022

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Data availability

The data that support the findings of this study are available from the first author upon reasonable request.

References

Ahmad S, Mehfuz S, Mebarek-Oudina F, Beg J (2022) RSM analysis based cloud access security broker: a systematic literature review. Cluster Comput 25:3733–3763
Amiri F, RezaeiYousefi M, Lucas C, Shakery A, Yazdani N (2011) Mutual information-based feature selection for intrusion detection systems. J Netw Comput Appl 34(4):1184–1199
Article Google Scholar
Battiti R (Jul. 1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550
Article Google Scholar
Bhuyan HK, Chakraborty C (2022) Explainable machine learning for data extraction across computational social system. In: IEEE Transactions on Computational Social Systems, pp 1–15. https://doi.org/10.1109/TCSS.2022.3164993
Bhuyan HK, Huque MS (2018) Sub-feature selection based classification. In: IEEE Explore, International Conference on Trends in Electronics and Informatics (ICOEI), pp 210–216. https://doi.org/10.1109/ICOEI.2018.8553763
Bhuyan HK, Kamila NK (2014) Privacy preserving Sub-feature Selection based on fuzzy probabilities. Cluster Comput (Springer) 17(4):1383–1399
Bhuyan HK, Kamila NK (2015) Privacy preserving sub-feature selection in distributed data mining. Appl Soft Compu, Elsevier 36:552–569 ISSN: 1568-4946
Bhuyan HK, Ravi VK (2021) Analysis of sub-feature for classification in data mining. In: IEEE Transaction on Engineering Management, pp 1–15. https://doi.org/10.1109/TEM.2021.3098463
Bhuyan HK, Mohanty M, Das SR (2012) Privacy preserving for feature selection in data mining using centralized network. Int J Compu Sci Issues (IJCSI) 9(3):434–440
Google Scholar
Bhuyan HK, Raghu Kumar L, Reddy KR (2019) Optimization model for sub-feature selection in data mining. In: 2nd International Conference on Smart Systems and Inventive Technology (ICSSIT 2019). IEEE Explore, pp 1–6. https://doi.org/10.1109/ICSSIT46314.2019.8987780
Bhuyan HK, Kamila NK, Pani SK (2022) Individual privacy in data mining using fuzzy optimization. Engineering Optimization, Taylor & Francis 54(8):1305–1323
Bhuyan HK, Ravi V, Brahma B, Kamila NK (2022) Disease analysis using machine learning approaches in healthcare system. Health Technol, Springer 12(5):987–1005
Article Google Scholar
Bhuyan HK, Ravi V, Yadav MS (2022) Multi-objective optimization-based privacy in data mining. Cluster Comput (Springer):1-13. https://doi.org/10.1007/s10586-022-03667-3
Chen C, Wei J, Peng C, Zhang W, Qin H (2020) Qingdao University, Stony Brook University, improved saliency detection in RGB-D images using two-phase depth estimation and selective deep fusion. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2020.2968250
Chitrakar R, Huang C (2014) Selection of candidate support vectors in incremental SVM for network intrusion detection. Comput Sec 45:231–241
Article Google Scholar
Chow TW, Huang D (Jan. 2005) Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information. IEEE Trans Neural Netw 16(1):213–224
Article Google Scholar
Croft WB, Metzler D, Strohman T (2010) Search engines: information retrieval in practice. Addison-Wesley, Reading, MA, USA
Google Scholar
Dahiru T (2008) P – value, a true test of statistical significance? a cautionary note. Annals Ibadan Postgrad Med 6(1)
Dhaminda B, Abeywickrama NB, Mamei M, Zambonelli F (2020) The SOTA approach to engineering collective adaptive systems. Int J Softw Tools Technol Transfer 22:399–415. https://doi.org/10.1007/s10009-020-00554-3
Article Google Scholar
Gakii C, Mireji PO, Rimiru R (2022) Graph based feature selection for reduction of dimensionality in next-generation RNA sequencing datasets, algorithms. MDPI 15(21):1–14
Google Scholar
Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG (2016) Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31:337–350
Article Google Scholar
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Proc Int Conf Neural Inf Process Syst:507–514
Hsu CN, Huang HJ, Dietrich S (2004) The ANNIGMA–wrapper approach to fast feature selection for neural nets. IEEE Trans Syst, Man, Cybern B, Cybern 32(2):207–212
Google Scholar
https://archive.ics.uci.edu/ml/datasets.php, 2020.
https://www.kaggle.com/datasets, 2020.
Kamila NK, Jena LD, Bhuyan HK (2016) Pareto-based multi-objective optimization for classification in data mining. Cluster Compu (Springer) 19(4):1723–1745 ISSN: 1386–7857 (print version) ISSN: 1573–7543 (electronic version)
Article Google Scholar
Kraskov A, Stogbauer H, Grassberger P (2004) Estimating € mutual information. Phys Rev E 69(6):066138
Article MathSciNet Google Scholar
Kwak N, Choi C-H (Jan. 2002) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159
Article Google Scholar
Li L, Weinberg CR, Darden TA, Pedersen LG (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12):1131–1142
Article Google Scholar
Li Z, Yang Y, Liu J, Zhou X, Lu H (2012) Unsupervised feature selection using nonnegative spectral analysis. Proc 26th AAAI Conf Artif Intell:1026–1032
Ma G, Li S, Chen C, Hao A, Qin H (2020) Stage-wise Salient Object Detection in 360° Omnidirectional Image via Object-level Semantical Saliency Ranking. IEEE Trans Vis Comput Graph 26(12):3535–3545. https://doi.org/10.1109/TVCG.2020.3023636
Article Google Scholar
Mao KZ (2004) Feature subset selection for support vector machines through discriminative function pruning analysis. IEEE Trans Syst, Man, Cybern B, Cybern 34(1):60–67
Article Google Scholar
Myat Thet Nyo F Mebarek-Oudina, SSH, Khan NA (2022) Otsu’s thresholding technique for MRI image brain tumor segmentation. Multimed Tools Appl. https://doi.org/10.1007/s11042-022-13215-1
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and minredundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
W. H. Press, P. Flannery, S. A. Teukolsky, W. T. Vetterling, et al., Numerical Recipes, Cambridge UP Cambridge etc, 1986.
MATH Google Scholar
Rifkin R, Klautau A (2004) In defense of one-vs-all classification. The J Mach Learn Res 5:101–141
MathSciNet MATH Google Scholar
Rossi F, Lendasse A, François D, Wertz V, Verleysen M (2006) Mutual information for the selection of relevant variables in spectrometric nonlinear modelling. Chemom Intell Lab Syst 80(2):215–226
Article Google Scholar
Song J, Takakura H, Okabe Y, Eto M, Inoue D, Nakao K (2011) Statistical analysis of honeypot data and building of kyoto 2006+dataset for nids evaluation. Proc 1st Workshop Building Anal Datasets Gathering Exp Ret Sec:29–36
Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell 32:112–123
Article Google Scholar
Tavallaee M, Bagheri E, Lu W, Ghorbani A-A (2009) A detailed analysis of the kdd cup 99 data set. Proc 2nd IEEE Symp Comput Intell Security Defence Appl:1–6
Wan Y, Sun S, Cheng Z (2021) Adaptive similarity embedding for unsupervised multi-view feature selection. IEEE Trans Knowl Data Eng 33(10):3338–3350
Article Google Scholar
Wang R, Bian J, Nie F, Li X (2022) Unsupervised Discriminative Projection for Feature Selection. IEEE Trans Knowl Data Eng 34(2):942–953
Article Google Scholar
Wang G, Chen C, Fan D-P, Hao A, Qin H (2022) Weakly Supervised Visual-Auditory Saliency Detection with Multigranularity Perception. IEEE Trans Pattern Anal Mach Intell:1–18 (published in Early access)
Zaffar M, Hashmani MA, Habib R, Quraishi KS, Irfan M, Alqhtani S, Hamdi M (2022) A hybrid feature selection framework for predicting students performance, computers. Mater Continua 70(1):1893–1920
Article Google Scholar
Zhang Y, Zhang Z, Li S, Qin J, Liu G, Wang M, Yan S (Dec. 2019) Unsupervised nonnegative adaptive feature extraction for data representation. IEEE Trans Knowl Data Eng 31(12):2423–2440
Article Google Scholar
Zhang L, Liu J, Zhang B, Zhang D, Zhu C (2020) Deep cascade model-based face recognition: when deep-layered learning meets small data. IEEE Trans Image Process 29:1016–1029
Article MathSciNet MATH Google Scholar
Zhu J, Liu Y, Wen C, Wu X (2022) DGDFS: dependence guided discriminative feature selection for predicting adverse drug-drug interaction. IEEE Trans Knowl Data Eng 34(1):271–285
Google Scholar

Download references

Code availability

The code is available from the first author upon reasonable request.

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Information Technology, Vignan’s Foundation for Science, Technology & Research (Deemed to be University), Guntur, AP, India
Hemanta Kumar Bhuyan & M Saikiran
Information Systems & Technology, T A Pai Management Institute, Manipal Academy of Higher Education, Manipal, India
Murchhana Tripathy
Center for Artificial Intelligence, Prince Mohammad Bin Fahd University, Khobar, 34754, Saudi Arabia
Vinayakumar Ravi

Authors

Hemanta Kumar Bhuyan
View author publications
You can also search for this author in PubMed Google Scholar
M Saikiran
View author publications
You can also search for this author in PubMed Google Scholar
Murchhana Tripathy
View author publications
You can also search for this author in PubMed Google Scholar
Vinayakumar Ravi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hemanta Kumar Bhuyan.

Ethics declarations

Conflicts of interest/competing interests

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Table 14 Evaluation results of correlation coefficient on diabetes dataset

Full size table

Table 15 Evaluation results of correlation coefficient on Heart dataset

Full size table

Table 16 Evaluation results of correlation coefficient on mobile price dataset

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bhuyan, H.K., Saikiran, M., Tripathy, M. et al. Wide-ranging approach-based feature selection for classification. Multimed Tools Appl 82, 23277–23304 (2023). https://doi.org/10.1007/s11042-022-14132-z

Download citation

Received: 05 December 2021
Revised: 20 October 2022
Accepted: 25 October 2022
Published: 17 November 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11042-022-14132-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Wide-ranging approach-based feature selection for classification

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Learning from imbalanced data: open challenges and future directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Data availability

References

Code availability

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest/competing interests

Additional information

Publisher’s note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Wide-ranging approach-based feature selection for classification

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Learning from imbalanced data: open challenges and future directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Data availability

References

Code availability

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest/competing interests

Additional information

Publisher’s note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation