Abstract
This literature introduces CAPPER, an efficient hybrid feature subset selection approach which is the combination of feature subsets from the Correlation-based feature selection (CFS) and Wrapper approaches. CFS is basically a filter approach that appraises the merits of subset attributes by classifying the feature ability according to the amount of redundancy between them and the feature subset selection for Wrappers that examines the attributes by applying the induction of various machine learning algorithms. For the evaluation of metrics, the CAPPER approach is tested on the different domains of datasets. The reduced, highly merit and accurate feature subsets obtained from CAPPER approach are then trained with the machine learning algorithm and evaluated by cross-validation for the set of attributes. Moreover, a statistical approach is applied for the significance of the result. It was observed that the CAPPER approach surpasses the CFS and Wrapper approaches on different domains of datasets.











































































Similar content being viewed by others
Change history
13 December 2021
A Correction to this paper has been published: https://doi.org/10.1007/s11042-021-11785-0
References
Aha DW, Blankert RL (1994) Feature selection for case-based classification of cloud types. In: Working Notes of the AAAI Workshop on Case-Based Reasoning, 106–112
Aha DW, Kibler D, Albert MK (1991) Instance based learning algorithms. Mach Learn, 37–66
Allen D (1974) The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16:125–127
Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. In Proceedings of the Ninth National Conference on Artificial Intelligence, 542–547
Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. In: Ninth National conference on Artificial intelligence, MIT Press, 547–552
Alpaydin E (2014) Introduction to machine learning. MIT Press, Cambridge
Arif J, Malik F, Aslam K (2017) A hybrid technique using binary particle swarm optimization and decision tree pruning for network intrusion detection, Cluster Computing, Springer, 667–680
Blum AL, Rivest RL (1992) Training a 3 node neural network is NP-complete. Neural Netw 5:117–127
Breiman L (1996a) Bagging predictors. Mach Learn 26(2):123–140
Breiman L(1996b) Out-of-bag estimation. ftp.stat.berkeley.edu/pub/users/breiman/OOBestimation.ps
Breiman L (2001) Random forests. Mach L 45(1):5–32
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth & Brooks/Cole Advanced books & Software, Monterey, CA
Cannady J (1998) Artificial neural networks for misuse detection. In: National information systems security conference, vol 26, 368–381
Cardie C (1995) Using decision trees to improve cased-based learning. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining. AAAI Press, 25–32
Catlett J (2005) On changing continuous attributes into ordered discrete attributes. In: Proceedings of the European Working Session on Learning, Springer-Verlag, 164–178
Cherkauer KJ, Shavlik JW (1996) Growing simpler decision trees to facilitate knowledge discovery. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. AAAI Press, 315–318
Corinna C, Vladimir V (1995) Support-vector networks. Mach Learn 20(3):273–297
Dash M, Liu H (1997) Feature selection for classification. Intell data anal 1(1–4):131–156
Dash M, Liu H (1998) Hybrid search of feature subsets, In: Proceedings of Pacific Rim International Conference on Artificial Intelligence (PRICAI-98), Singapore, 238–249
Denoeux T (1995) A k-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Trans Syst Man Cybern 25(5):804–813
Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice-Hall international
Dietterich TG (1988) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1924
Domingos P (1997) Context-sensitive feature selection for lazy learners. Artif Intell Rev 11:227–253
Domingos P, Pazzani M (1996) Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, 105–112
Dua D, Graff C (2019) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science
Fan G-F, Peng L-L, Hong W-C, Sun F (2016) Electric load forecasting by the SVR model with differential empirical mode decomposition and auto regression. Neurocomputing 173:958–970. https://doi.org/10.1016/j.neucom.2015.08.051
Fournier-Viger P, Lin CW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF Open-Source Data Mining Library Version 2. Proc. 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, Springer LNCS 9853. 36–40. https://www.philippe-fournier-viger.com/spmf/
Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4:1–48
Hall MA, Smith LA (1997) Feature Subset selection: A correlation based filter approach. In international conference on neural information processing and intelligent information system, 855–858
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning, Springer
Hindy H, Brosset D (2018) A Taxonomy and Survey of Intrusion Detection System Design Techniques, Network Threats and Datasets, arXiv: 1806.03517v1 [cs.CR] 9, 1–35
Hong W-C (2011) Electric load forecasting by seasonal recurrent SVR (support vector regression) with chaotic artificial bee colony algorithm. Energy 36:5568–5578. https://doi.org/10.1016/j.energy.2011.07.015
Hong W-C, Dong Y, Zhang WY, Li-Yueh C, Panigrahi BK (2013) Cyclic electric load forecasting by seasonal SVR with chaotic genetic algorithm. Int J Electr Power Energy Syst 44:604–614. https://doi.org/10.1016/j.ijepes.2012.08.010
Hyafil L, Rivest RL (1976) Constructing optimal binary decision trees is NP-complete. Inf Process Lett 5(1):15–17
John GH (1997) Enhancements to data mining process, PhD thesis, Standford University, Computer Science department
John GH, Kohavi R, Pfleger P (1994) Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 121–129
Kohavi R, John G, Long R, Manley D, Pfleger K (1994) MLC++: A machine learning library in C++, in Tools with Artificial Intelligence, IEEE Computer Society Press, pp 740–743
Kar P, Banerjee S, Mondal KC, Mahapatra G, Chattopadhyay S (2019) A Hybrid Intrusion Detection System for Hierarchical Filtration of Anomalies. Inf Commun Tech Intell Syst Smart Innovation Syst Tech 106,417–426
Kavitha P, Usha M (2014) Anomaly based intrusion detection in WLAN using discrimination algorithm combined with Naïve Bayesian classifier. J Theor Appl Inf Technol 62(1):77–84
Kira K, Rendell LA (1992) A practical approach to feature selection. In: Machine Learning: Proceedings of the Ninth International Conference, pp 249–256
Kittler J (1986) Feature selection and extraction, Academic press, Inc, Chapter 3, 59–83
Kohavi R (1995) The Power of Decision Tables. In: 8th European Conference on Machine Learning, 174–189
Kohavi R, Frasca B (1994) Useful feature subsets and rough sets reducts. In: Proceedings of the Third International Workshop on Rough Sets and Soft Computing, 310–317
Kohavi R, John G (1997) Wrappers for feature subset selection. Artificial Intelligence, special issue on relevance, 273–324
Koller D, Sahami M (1996) Towards optimal feature selection. In: Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, 248–292
Kononenko I (1994) Estimating attributes: Analysis and extensions of relief. In: Proceedings of the European Conference on Machine Learning, 171–182
Langley P (1994) Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall Symposium on Relevance. AAAI Press, 127–131
Langley P, Sage S (1994) Scaling to domains with irrelevant features. In: Computational Learning Theory and Natural Learning Systems, volume 4. MIT Press, 17–29
Langley P, Sage S (1994) Induction of selective Bayesian classifiers. In: Proceedings of the tenth conference on uncertainty in artificial intelligence. Seattle, W.A. Morgan Kaufmann, pp 339–406
Li M-W, Geng J, Hong W-C, Zhang L-D (2019) Periodogram estimation based on LSSVR-CCPSO compensation for forecasting ship motion. Nonlinear Dyn 97(4):2579–2594. https://doi.org/10.1007/s11071-019-05149-5
Liao X, Li K, Zhu X, Liu KJR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE J Sel Top Sign Proces 14(5):955–968. https://doi.org/10.1109/JSTSP.2020.3002391
Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining, Vol. 454, Springer
Liu H, Setiono R (1996) A probabilistic approach to feature selection: a filter solution. In Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning, Morgan Kaufmann, 319–327
Liu H, Yu L (2005) Towards integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502
Mladenic D, Grobelnik M (1999) Feature selection for unbalanced class distribution and naive Bayes. In: ICML, vol 99, 258–267
Moore AW, Hill DJ, Johnson MP (1992) An empirical investigation of brute force to choose features, smoothers and function approximators. In: Computational Learning Theory and Natural Learning Systems, volume 3. MIT Press
Moore W, Lee MS (1994) Efficient algorithms for minimizing cross validation error. In: Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 190–198
Neha A, Shailendra S (2017) An IWD-based feature selection method for intrusion detection system. Soft Computing, Springer, 4407–4416
Pazzani M (1996) Searching for dependencies in Bayesian classifiers. In: Proceedings of the Fifth International Workshop on AI and Statistics, 239–248
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Mateo, California
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods: Support Vector Learning, 185–208
Provan GM, Singh M (1996) Learning Bayesian networks using feature selection. In: Learning from Data, Lecture Notes in Statistics, Springer-Verlag, New York, 1996, 291–300
Quinlan RR (1986) Induction of decision trees. Mach Learn 1:81–106
Quinlan RC (1993) 4.5 Programs For machine learning. Morgan Kaufmann Publishers Inc, San Francisco
Revathi S, Malathi A (2013) A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion detection. Int J Eng Res Tech 2(12):1848–1853
Ron K, George HJ (1997) Wrappers for feature subset selection. Artif Intell:273–324
Russel SJ, Norvig P (1995) Artificial intelligence: a modern approach. Prentice Hall, Englewood Cliffs
Scherf M, Brauer W (1997) Feature selection by means of a feature weighting approach. Technical Report, Technische University Munchen
Setiono R, Liu H (1995) Chi2: Feature selection and discretization of numeric attributes. In: Proceedings of the Seventh IEEE International Conference on Tools with Artificial Intelligence, 388–391
Singh M, Provan GM (1996) Efficient learning of selective Bayesian network classifiers. In: Machine Learning: Proceedings of the Thirteenth International Conference on Machine Learning, Morgan Kaufmann, 453–461
Skalak DB (1994) Prototype and feature selection by sampling and random mutation hill climbing algorithms. In Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, 293–301
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD-CUP ‘99 data set, proceedings of the 2009 IEEE symposium on computational intelligence in security and defense applications. Ottwa, 1–6
Tay FEH, Shen L (2002) A modified Chi2 algorithm for discretization. IEEE Trans Knowl Data Eng 14(3):666–670
Thrun S (1991) The monks problem: a performance comparison of different learning algorithms, technical report CMU-CS-91-197, Carengie Mellon University
Vafaie H, De Jong K (1995) Genetic algorithms as a tool for restructuring feature space representations. In: Proceedings of the International Conference on Tools with A.I. IEEE Computer Society Press, 57–65
Weka Data Mining Tool (1999) https://www.cs.waikato.ac.nz/ml/weka/
Zhang Z, Ding S, Jia W (2019) A hybrid optimization algorithm based on cuckoo search and different evolution for solving constrained engineering problems. Eng Appl Artif Intell 85:254–268. https://doi.org/10.1016/j.engappai.2019.06.017
Zhang Z, Ding S, Sun Y (2020) A support vector regression model hybridized with chaotic krill herd algorithm and empirical mode decomposition for regression task. Neurocomputing 410:185–201. https://doi.org/10.1016/j.neucom.2020.05.075
Zhang Z, Hong W-C (2019) Electric load forecasting by complete ensemble empirical model decomposition adaptive noise and support vector regression with quantum-based dragonfly algorithm. Nonlinear Dyn 98:1107–1136. https://doi.org/10.1007/s11071-019-05252-7
Acknowledgements
Thanks to Dr. Philippe Fournier-Viger, the founder of SPMF [27] data mining library, for providing the suggestions to our work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
In the fifth line of the first paragraph of section 2.1.1 The problem, the word “relevant” was misspelled as “irrelevant.”
Rights and permissions
About this article
Cite this article
N, T.R., Gupta, R. An efficient feature subset selection approach for machine learning. Multimed Tools Appl 80, 12737–12830 (2021). https://doi.org/10.1007/s11042-020-10011-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10011-7