Skip to main content
Log in

An efficient feature subset selection approach for machine learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

A Correction to this article was published on 13 December 2021

This article has been updated

Abstract

This literature introduces CAPPER, an efficient hybrid feature subset selection approach which is the combination of feature subsets from the Correlation-based feature selection (CFS) and Wrapper approaches. CFS is basically a filter approach that appraises the merits of subset attributes by classifying the feature ability according to the amount of redundancy between them and the feature subset selection for Wrappers that examines the attributes by applying the induction of various machine learning algorithms. For the evaluation of metrics, the CAPPER approach is tested on the different domains of datasets. The reduced, highly merit and accurate feature subsets obtained from CAPPER approach are then trained with the machine learning algorithm and evaluated by cross-validation for the set of attributes. Moreover, a statistical approach is applied for the significance of the result. It was observed that the CAPPER approach surpasses the CFS and Wrapper approaches on different domains of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33
Fig. 34
Fig. 35
Fig. 36
Fig. 37
Fig. 38
Fig. 39
Fig. 40
Fig. 41
Fig. 42
Fig. 43
Fig. 44
Fig. 45
Fig. 46
Fig. 47
Fig. 48
Fig. 49
Fig. 50
Fig. 51
Fig. 52
Fig. 53
Fig. 54
Fig. 55
Fig. 56
Fig. 57
Fig. 58
Fig. 59
Fig. 60
Fig. 61
Fig. 62
Fig. 63
Fig. 64
Fig. 65
Fig. 66
Fig. 67
Fig. 68
Fig. 69
Fig. 70
Fig. 71
Fig. 72
Fig. 73
Fig. 74
Fig. 75

Similar content being viewed by others

Change history

References

  1. Aha DW, Blankert RL (1994) Feature selection for case-based classification of cloud types. In: Working Notes of the AAAI Workshop on Case-Based Reasoning, 106–112

  2. Aha DW, Kibler D, Albert MK (1991) Instance based learning algorithms. Mach Learn, 37–66

  3. Allen D (1974) The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16:125–127

    Article  MathSciNet  Google Scholar 

  4. Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. In Proceedings of the Ninth National Conference on Artificial Intelligence, 542–547

  5. Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. In: Ninth National conference on Artificial intelligence, MIT Press, 547–552

  6. Alpaydin E (2014) Introduction to machine learning. MIT Press, Cambridge

    MATH  Google Scholar 

  7. Arif J, Malik F, Aslam K (2017) A hybrid technique using binary particle swarm optimization and decision tree pruning for network intrusion detection, Cluster Computing, Springer, 667–680

  8. Blum AL, Rivest RL (1992) Training a 3 node neural network is NP-complete. Neural Netw 5:117–127

    Article  Google Scholar 

  9. Breiman L (1996a) Bagging predictors. Mach Learn 26(2):123–140

  10. Breiman L(1996b) Out-of-bag estimation. ftp.stat.berkeley.edu/pub/users/breiman/OOBestimation.ps

  11. Breiman L (2001) Random forests. Mach L 45(1):5–32

    Article  Google Scholar 

  12. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth & Brooks/Cole Advanced books & Software, Monterey, CA

    MATH  Google Scholar 

  13. Cannady J (1998) Artificial neural networks for misuse detection. In: National information systems security conference, vol 26, 368–381

  14. Cardie C (1995) Using decision trees to improve cased-based learning. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining. AAAI Press, 25–32

  15. Catlett J (2005) On changing continuous attributes into ordered discrete attributes. In: Proceedings of the European Working Session on Learning, Springer-Verlag, 164–178

  16. Cherkauer KJ, Shavlik JW (1996) Growing simpler decision trees to facilitate knowledge discovery. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. AAAI Press, 315–318

  17. Corinna C, Vladimir V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  18. Dash M, Liu H (1997) Feature selection for classification. Intell data anal 1(1–4):131–156

    Article  Google Scholar 

  19. Dash M, Liu H (1998) Hybrid search of feature subsets, In: Proceedings of Pacific Rim International Conference on Artificial Intelligence (PRICAI-98), Singapore, 238–249

  20. Denoeux T (1995) A k-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Trans Syst Man Cybern 25(5):804–813

    Article  Google Scholar 

  21. Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice-Hall international

  22. Dietterich TG (1988) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1924

    Article  Google Scholar 

  23. Domingos P (1997) Context-sensitive feature selection for lazy learners. Artif Intell Rev 11:227–253

    Article  Google Scholar 

  24. Domingos P, Pazzani M (1996) Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, 105–112

  25. Dua D, Graff C (2019) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science

  26. Fan G-F, Peng L-L, Hong W-C, Sun F (2016) Electric load forecasting by the SVR model with differential empirical mode decomposition and auto regression. Neurocomputing 173:958–970. https://doi.org/10.1016/j.neucom.2015.08.051

    Article  Google Scholar 

  27. Fournier-Viger P, Lin CW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF Open-Source Data Mining Library Version 2. Proc. 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, Springer LNCS 9853. 36–40. https://www.philippe-fournier-viger.com/spmf/

  28. Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4:1–48

    Article  Google Scholar 

  29. Hall MA, Smith LA (1997) Feature Subset selection: A correlation based filter approach. In international conference on neural information processing and intelligent information system, 855–858

  30. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning, Springer

  31. Hindy H, Brosset D (2018) A Taxonomy and Survey of Intrusion Detection System Design Techniques, Network Threats and Datasets, arXiv: 1806.03517v1 [cs.CR] 9, 1–35

  32. Hong W-C (2011) Electric load forecasting by seasonal recurrent SVR (support vector regression) with chaotic artificial bee colony algorithm. Energy 36:5568–5578. https://doi.org/10.1016/j.energy.2011.07.015

    Article  Google Scholar 

  33. Hong W-C, Dong Y, Zhang WY, Li-Yueh C, Panigrahi BK (2013) Cyclic electric load forecasting by seasonal SVR with chaotic genetic algorithm. Int J Electr Power Energy Syst 44:604–614. https://doi.org/10.1016/j.ijepes.2012.08.010

    Article  Google Scholar 

  34. Hyafil L, Rivest RL (1976) Constructing optimal binary decision trees is NP-complete. Inf Process Lett 5(1):15–17

    Article  MathSciNet  Google Scholar 

  35. John GH (1997) Enhancements to data mining process, PhD thesis, Standford University, Computer Science department

  36. John GH, Kohavi R, Pfleger P (1994) Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 121–129

  37. Kohavi R, John G, Long R, Manley D, Pfleger K (1994) MLC++: A machine learning library in C++, in Tools with Artificial Intelligence, IEEE Computer Society Press, pp 740–743

  38. Kar P, Banerjee S, Mondal KC, Mahapatra G, Chattopadhyay S (2019) A Hybrid Intrusion Detection System for Hierarchical Filtration of Anomalies. Inf Commun Tech Intell Syst Smart Innovation Syst Tech 106,417–426

  39. Kavitha P, Usha M (2014) Anomaly based intrusion detection in WLAN using discrimination algorithm combined with Naïve Bayesian classifier. J Theor Appl Inf Technol 62(1):77–84

    Google Scholar 

  40. Kira K, Rendell LA (1992) A practical approach to feature selection. In: Machine Learning: Proceedings of the Ninth International Conference, pp 249–256

  41. Kittler J (1986) Feature selection and extraction, Academic press, Inc, Chapter 3, 59–83

  42. Kohavi R (1995) The Power of Decision Tables. In: 8th European Conference on Machine Learning, 174–189

  43. Kohavi R, Frasca B (1994) Useful feature subsets and rough sets reducts. In: Proceedings of the Third International Workshop on Rough Sets and Soft Computing, 310–317

  44. Kohavi R, John G (1997) Wrappers for feature subset selection. Artificial Intelligence, special issue on relevance, 273–324

  45. Koller D, Sahami M (1996) Towards optimal feature selection. In: Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, 248–292

  46. Kononenko I (1994) Estimating attributes: Analysis and extensions of relief. In: Proceedings of the European Conference on Machine Learning, 171–182

  47. Langley P (1994) Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall Symposium on Relevance. AAAI Press, 127–131

  48. Langley P, Sage S (1994) Scaling to domains with irrelevant features. In: Computational Learning Theory and Natural Learning Systems, volume 4. MIT Press, 17–29

  49. Langley P, Sage S (1994) Induction of selective Bayesian classifiers. In: Proceedings of the tenth conference on uncertainty in artificial intelligence. Seattle, W.A. Morgan Kaufmann, pp 339–406

    Google Scholar 

  50. Li M-W, Geng J, Hong W-C, Zhang L-D (2019) Periodogram estimation based on LSSVR-CCPSO compensation for forecasting ship motion. Nonlinear Dyn 97(4):2579–2594. https://doi.org/10.1007/s11071-019-05149-5

    Article  Google Scholar 

  51. Liao X, Li K, Zhu X, Liu KJR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE J Sel Top Sign Proces 14(5):955–968. https://doi.org/10.1109/JSTSP.2020.3002391

    Article  Google Scholar 

  52. Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining, Vol. 454, Springer

  53. Liu H, Setiono R (1996) A probabilistic approach to feature selection: a filter solution. In Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning, Morgan Kaufmann, 319–327

  54. Liu H, Yu L (2005) Towards integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502

    Article  MathSciNet  Google Scholar 

  55. Mladenic D, Grobelnik M (1999) Feature selection for unbalanced class distribution and naive Bayes. In: ICML, vol 99, 258–267

  56. Moore AW, Hill DJ, Johnson MP (1992) An empirical investigation of brute force to choose features, smoothers and function approximators. In: Computational Learning Theory and Natural Learning Systems, volume 3. MIT Press

  57. Moore W, Lee MS (1994) Efficient algorithms for minimizing cross validation error. In: Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 190–198

  58. Neha A, Shailendra S (2017) An IWD-based feature selection method for intrusion detection system. Soft Computing, Springer, 4407–4416

  59. Pazzani M (1996) Searching for dependencies in Bayesian classifiers. In: Proceedings of the Fifth International Workshop on AI and Statistics, 239–248

  60. Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Mateo, California

  61. Platt J (1998) Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods: Support Vector Learning, 185–208

  62. Provan GM, Singh M (1996) Learning Bayesian networks using feature selection. In: Learning from Data, Lecture Notes in Statistics, Springer-Verlag, New York, 1996, 291–300

  63. Quinlan RR (1986) Induction of decision trees. Mach Learn 1:81–106

    Google Scholar 

  64. Quinlan RC (1993) 4.5 Programs For machine learning. Morgan Kaufmann Publishers Inc, San Francisco

  65. Revathi S, Malathi A (2013) A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion detection. Int J Eng Res Tech 2(12):1848–1853

    Google Scholar 

  66. Ron K, George HJ (1997) Wrappers for feature subset selection. Artif Intell:273–324

  67. Russel SJ, Norvig P (1995) Artificial intelligence: a modern approach. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  68. Scherf M, Brauer W (1997) Feature selection by means of a feature weighting approach. Technical Report, Technische University Munchen

  69. Setiono R, Liu H (1995) Chi2: Feature selection and discretization of numeric attributes. In: Proceedings of the Seventh IEEE International Conference on Tools with Artificial Intelligence, 388–391

  70. Singh M, Provan GM (1996) Efficient learning of selective Bayesian network classifiers. In: Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning, Morgan Kaufmann, 453–461

  71. Skalak DB (1994) Prototype and feature selection by sampling and random mutation hill climbing algorithms. In Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 293–301

  72. Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD-CUP ‘99 data set, proceedings of the 2009 IEEE symposium on computational intelligence in security and defense applications. Ottwa, 1–6

  73. Tay FEH, Shen L (2002) A modified Chi2 algorithm for discretization. IEEE Trans Knowl Data Eng 14(3):666–670

    Article  Google Scholar 

  74. Thrun S (1991) The monks problem: a performance comparison of different learning algorithms, technical report CMU-CS-91-197, Carengie Mellon University

  75. Vafaie H, De Jong K (1995) Genetic algorithms as a tool for restructuring feature space representations. In: Proceedings of the International Conference on Tools with A.I. IEEE Computer Society Press, 57–65

  76. Weka Data Mining Tool (1999) https://www.cs.waikato.ac.nz/ml/weka/

  77. Zhang Z, Ding S, Jia W (2019) A hybrid optimization algorithm based on cuckoo search and different evolution for solving constrained engineering problems. Eng Appl Artif Intell 85:254–268. https://doi.org/10.1016/j.engappai.2019.06.017

    Article  Google Scholar 

  78. Zhang Z, Ding S, Sun Y (2020) A support vector regression model hybridized with chaotic krill herd algorithm and empirical mode decomposition for regression task. Neurocomputing 410:185–201. https://doi.org/10.1016/j.neucom.2020.05.075

    Article  Google Scholar 

  79. Zhang Z, Hong W-C (2019) Electric load forecasting by complete ensemble empirical model decomposition adaptive noise and support vector regression with quantum-based dragonfly algorithm. Nonlinear Dyn 98:1107–1136. https://doi.org/10.1007/s11071-019-05252-7

    Article  Google Scholar 

Download references

Acknowledgements

Thanks to Dr. Philippe Fournier-Viger, the founder of SPMF [27] data mining library, for providing the suggestions to our work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Rincy N.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

In the fifth line of the first paragraph of section 2.1.1 The problem, the word “relevant” was misspelled as “irrelevant.”

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

N, T.R., Gupta, R. An efficient feature subset selection approach for machine learning. Multimed Tools Appl 80, 12737–12830 (2021). https://doi.org/10.1007/s11042-020-10011-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10011-7

Keywords