An efficient feature subset selection approach for machine learning

N, Thomas Rincy; Gupta, Roopam

doi:10.1007/s11042-020-10011-7

An efficient feature subset selection approach for machine learning

Published: 12 January 2021

Volume 80, pages 12737–12830, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

860 Accesses
Explore all metrics

A Correction to this article was published on 13 December 2021

This article has been updated

Abstract

This literature introduces CAPPER, an efficient hybrid feature subset selection approach which is the combination of feature subsets from the Correlation-based feature selection (CFS) and Wrapper approaches. CFS is basically a filter approach that appraises the merits of subset attributes by classifying the feature ability according to the amount of redundancy between them and the feature subset selection for Wrappers that examines the attributes by applying the induction of various machine learning algorithms. For the evaluation of metrics, the CAPPER approach is tested on the different domains of datasets. The reduced, highly merit and accurate feature subsets obtained from CAPPER approach are then trained with the machine learning algorithm and evaluated by cross-validation for the set of attributes. Moreover, a statistical approach is applied for the significance of the result. It was observed that the CAPPER approach surpasses the CFS and Wrapper approaches on different domains of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature Selection Models for Data Classification: Wrapper Model vs Filter Model

Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

A Comparison of Feature Construction Methods in the Context of Supervised Feature Selection for Classification

Change history

13 December 2021
A Correction to this paper has been published: https://doi.org/10.1007/s11042-021-11785-0

References

Aha DW, Blankert RL (1994) Feature selection for case-based classification of cloud types. In: Working Notes of the AAAI Workshop on Case-Based Reasoning, 106–112
Aha DW, Kibler D, Albert MK (1991) Instance based learning algorithms. Mach Learn, 37–66
Allen D (1974) The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16:125–127
Article MathSciNet Google Scholar
Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. In Proceedings of the Ninth National Conference on Artificial Intelligence, 542–547
Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. In: Ninth National conference on Artificial intelligence, MIT Press, 547–552
Alpaydin E (2014) Introduction to machine learning. MIT Press, Cambridge
MATH Google Scholar
Arif J, Malik F, Aslam K (2017) A hybrid technique using binary particle swarm optimization and decision tree pruning for network intrusion detection, Cluster Computing, Springer, 667–680
Blum AL, Rivest RL (1992) Training a 3 node neural network is NP-complete. Neural Netw 5:117–127
Article Google Scholar
Breiman L (1996a) Bagging predictors. Mach Learn 26(2):123–140
Breiman L(1996b) Out-of-bag estimation. ftp.stat.berkeley.edu/pub/users/breiman/OOBestimation.ps
Breiman L (2001) Random forests. Mach L 45(1):5–32
Article Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth & Brooks/Cole Advanced books & Software, Monterey, CA
MATH Google Scholar
Cannady J (1998) Artificial neural networks for misuse detection. In: National information systems security conference, vol 26, 368–381
Cardie C (1995) Using decision trees to improve cased-based learning. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining. AAAI Press, 25–32
Catlett J (2005) On changing continuous attributes into ordered discrete attributes. In: Proceedings of the European Working Session on Learning, Springer-Verlag, 164–178
Cherkauer KJ, Shavlik JW (1996) Growing simpler decision trees to facilitate knowledge discovery. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. AAAI Press, 315–318
Corinna C, Vladimir V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Dash M, Liu H (1997) Feature selection for classification. Intell data anal 1(1–4):131–156
Article Google Scholar
Dash M, Liu H (1998) Hybrid search of feature subsets, In: Proceedings of Pacific Rim International Conference on Artificial Intelligence (PRICAI-98), Singapore, 238–249
Denoeux T (1995) A k-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Trans Syst Man Cybern 25(5):804–813
Article Google Scholar
Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice-Hall international
Dietterich TG (1988) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1924
Article Google Scholar
Domingos P (1997) Context-sensitive feature selection for lazy learners. Artif Intell Rev 11:227–253
Article Google Scholar
Domingos P, Pazzani M (1996) Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, 105–112
Dua D, Graff C (2019) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science
Fan G-F, Peng L-L, Hong W-C, Sun F (2016) Electric load forecasting by the SVR model with differential empirical mode decomposition and auto regression. Neurocomputing 173:958–970. https://doi.org/10.1016/j.neucom.2015.08.051
Article Google Scholar
Fournier-Viger P, Lin CW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF Open-Source Data Mining Library Version 2. Proc. 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, Springer LNCS 9853. 36–40. https://www.philippe-fournier-viger.com/spmf/
Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4:1–48
Article Google Scholar
Hall MA, Smith LA (1997) Feature Subset selection: A correlation based filter approach. In international conference on neural information processing and intelligent information system, 855–858
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning, Springer
Hindy H, Brosset D (2018) A Taxonomy and Survey of Intrusion Detection System Design Techniques, Network Threats and Datasets, arXiv: 1806.03517v1 [cs.CR] 9, 1–35
Hong W-C (2011) Electric load forecasting by seasonal recurrent SVR (support vector regression) with chaotic artificial bee colony algorithm. Energy 36:5568–5578. https://doi.org/10.1016/j.energy.2011.07.015
Article Google Scholar
Hong W-C, Dong Y, Zhang WY, Li-Yueh C, Panigrahi BK (2013) Cyclic electric load forecasting by seasonal SVR with chaotic genetic algorithm. Int J Electr Power Energy Syst 44:604–614. https://doi.org/10.1016/j.ijepes.2012.08.010
Article Google Scholar
Hyafil L, Rivest RL (1976) Constructing optimal binary decision trees is NP-complete. Inf Process Lett 5(1):15–17
Article MathSciNet Google Scholar
John GH (1997) Enhancements to data mining process, PhD thesis, Standford University, Computer Science department
John GH, Kohavi R, Pfleger P (1994) Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 121–129
Kohavi R, John G, Long R, Manley D, Pfleger K (1994) MLC++: A machine learning library in C++, in Tools with Artificial Intelligence, IEEE Computer Society Press, pp 740–743
Kar P, Banerjee S, Mondal KC, Mahapatra G, Chattopadhyay S (2019) A Hybrid Intrusion Detection System for Hierarchical Filtration of Anomalies. Inf Commun Tech Intell Syst Smart Innovation Syst Tech 106,417–426
Kavitha P, Usha M (2014) Anomaly based intrusion detection in WLAN using discrimination algorithm combined with Naïve Bayesian classifier. J Theor Appl Inf Technol 62(1):77–84
Google Scholar
Kira K, Rendell LA (1992) A practical approach to feature selection. In: Machine Learning: Proceedings of the Ninth International Conference, pp 249–256
Kittler J (1986) Feature selection and extraction, Academic press, Inc, Chapter 3, 59–83
Kohavi R (1995) The Power of Decision Tables. In: 8th European Conference on Machine Learning, 174–189
Kohavi R, Frasca B (1994) Useful feature subsets and rough sets reducts. In: Proceedings of the Third International Workshop on Rough Sets and Soft Computing, 310–317
Kohavi R, John G (1997) Wrappers for feature subset selection. Artificial Intelligence, special issue on relevance, 273–324
Koller D, Sahami M (1996) Towards optimal feature selection. In: Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, 248–292
Kononenko I (1994) Estimating attributes: Analysis and extensions of relief. In: Proceedings of the European Conference on Machine Learning, 171–182
Langley P (1994) Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall Symposium on Relevance. AAAI Press, 127–131
Langley P, Sage S (1994) Scaling to domains with irrelevant features. In: Computational Learning Theory and Natural Learning Systems, volume 4. MIT Press, 17–29
Langley P, Sage S (1994) Induction of selective Bayesian classifiers. In: Proceedings of the tenth conference on uncertainty in artificial intelligence. Seattle, W.A. Morgan Kaufmann, pp 339–406
Google Scholar
Li M-W, Geng J, Hong W-C, Zhang L-D (2019) Periodogram estimation based on LSSVR-CCPSO compensation for forecasting ship motion. Nonlinear Dyn 97(4):2579–2594. https://doi.org/10.1007/s11071-019-05149-5
Article Google Scholar
Liao X, Li K, Zhu X, Liu KJR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE J Sel Top Sign Proces 14(5):955–968. https://doi.org/10.1109/JSTSP.2020.3002391
Article Google Scholar
Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining, Vol. 454, Springer
Liu H, Setiono R (1996) A probabilistic approach to feature selection: a filter solution. In Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning, Morgan Kaufmann, 319–327
Liu H, Yu L (2005) Towards integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502
Article MathSciNet Google Scholar
Mladenic D, Grobelnik M (1999) Feature selection for unbalanced class distribution and naive Bayes. In: ICML, vol 99, 258–267
Moore AW, Hill DJ, Johnson MP (1992) An empirical investigation of brute force to choose features, smoothers and function approximators. In: Computational Learning Theory and Natural Learning Systems, volume 3. MIT Press
Moore W, Lee MS (1994) Efficient algorithms for minimizing cross validation error. In: Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 190–198
Neha A, Shailendra S (2017) An IWD-based feature selection method for intrusion detection system. Soft Computing, Springer, 4407–4416
Pazzani M (1996) Searching for dependencies in Bayesian classifiers. In: Proceedings of the Fifth International Workshop on AI and Statistics, 239–248
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Mateo, California
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods: Support Vector Learning, 185–208
Provan GM, Singh M (1996) Learning Bayesian networks using feature selection. In: Learning from Data, Lecture Notes in Statistics, Springer-Verlag, New York, 1996, 291–300
Quinlan RR (1986) Induction of decision trees. Mach Learn 1:81–106
Google Scholar
Quinlan RC (1993) 4.5 Programs For machine learning. Morgan Kaufmann Publishers Inc, San Francisco
Revathi S, Malathi A (2013) A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion detection. Int J Eng Res Tech 2(12):1848–1853
Google Scholar
Ron K, George HJ (1997) Wrappers for feature subset selection. Artif Intell:273–324
Russel SJ, Norvig P (1995) Artificial intelligence: a modern approach. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Scherf M, Brauer W (1997) Feature selection by means of a feature weighting approach. Technical Report, Technische University Munchen
Setiono R, Liu H (1995) Chi2: Feature selection and discretization of numeric attributes. In: Proceedings of the Seventh IEEE International Conference on Tools with Artificial Intelligence, 388–391
Singh M, Provan GM (1996) Efficient learning of selective Bayesian network classifiers. In: Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning, Morgan Kaufmann, 453–461
Skalak DB (1994) Prototype and feature selection by sampling and random mutation hill climbing algorithms. In Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 293–301
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD-CUP ‘99 data set, proceedings of the 2009 IEEE symposium on computational intelligence in security and defense applications. Ottwa, 1–6
Tay FEH, Shen L (2002) A modified Chi2 algorithm for discretization. IEEE Trans Knowl Data Eng 14(3):666–670
Article Google Scholar
Thrun S (1991) The monks problem: a performance comparison of different learning algorithms, technical report CMU-CS-91-197, Carengie Mellon University
Vafaie H, De Jong K (1995) Genetic algorithms as a tool for restructuring feature space representations. In: Proceedings of the International Conference on Tools with A.I. IEEE Computer Society Press, 57–65
Weka Data Mining Tool (1999) https://www.cs.waikato.ac.nz/ml/weka/
Zhang Z, Ding S, Jia W (2019) A hybrid optimization algorithm based on cuckoo search and different evolution for solving constrained engineering problems. Eng Appl Artif Intell 85:254–268. https://doi.org/10.1016/j.engappai.2019.06.017
Article Google Scholar
Zhang Z, Ding S, Sun Y (2020) A support vector regression model hybridized with chaotic krill herd algorithm and empirical mode decomposition for regression task. Neurocomputing 410:185–201. https://doi.org/10.1016/j.neucom.2020.05.075
Article Google Scholar
Zhang Z, Hong W-C (2019) Electric load forecasting by complete ensemble empirical model decomposition adaptive noise and support vector regression with quantum-based dragonfly algorithm. Nonlinear Dyn 98:1107–1136. https://doi.org/10.1007/s11071-019-05252-7
Article Google Scholar

Download references

Acknowledgements

Thanks to Dr. Philippe Fournier-Viger, the founder of SPMF [27] data mining library, for providing the suggestions to our work.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University Institute of Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal, M.P, India
Thomas Rincy N
Department of Information Technology, University Institute of Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal, M.P, India
Roopam Gupta

Authors

Thomas Rincy N
View author publications
You can also search for this author inPubMed Google Scholar
Roopam Gupta
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Thomas Rincy N.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

In the fifth line of the first paragraph of section 2.1.1 The problem, the word “relevant” was misspelled as “irrelevant.”

Rights and permissions

Reprints and permissions

About this article

Cite this article

N, T.R., Gupta, R. An efficient feature subset selection approach for machine learning. Multimed Tools Appl 80, 12737–12830 (2021). https://doi.org/10.1007/s11042-020-10011-7

Download citation

Received: 11 July 2020
Revised: 21 September 2020
Accepted: 29 September 2020
Published: 12 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11042-020-10011-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient feature subset selection approach for machine learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Feature Selection Models for Data Classification: Wrapper Model vs Filter Model

Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

A Comparison of Feature Construction Methods in the Context of Supervised Feature Selection for Classification

Change history

13 December 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now