Abstract
Feature subset selection is one of the techniques to extract the highly relevant subset of original features from a dataset. In this paper, we have proposed a new algorithm to filter the features from the dataset using a greedy stepwise forward selection technique. The Proposed algorithm uses gain ratio as the greedy evaluation measure. It utilizes multiple feature correlation technique to remove the redundant features from the data set. Experiments that are carried out to evaluate the Proposed algorithm are based on number of features, runtime and classification accuracy of three classifiers namely Naïve Bayes, the Tree based C4.5 and Instant Based IB1. The results have been compared with other two feature selection algorithms, i.e. Fast Correlation-Based Filter Solution (FCBS) and Fast clustering based feature selection algorithm (FAST) over the datasets of different dimensions and domain. A unified metric, which combines all three parameters (number of features, runtime, classification accuracy) together, has also been taken to compare the algorithms. The result shows that our Proposed algorithm has a significant improvement than other feature selection algorithms for large dimensional data while working on a data set of image domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kohavi, R., John, G.H.: Wrapper for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Das, S.: Filter, wrapper and a boosting-based hybrid for feature selection. In: Proceedings of Eighteenth International Conference on Machine Learning, pp. 74–81 (2001)
Dash, M., Liu, H., Motoda, H.: Consistency based feature selection. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 98–109. Springer, Heidelberg (2000)
Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC (2003)
Huang, J., Cai, Y., Xu, X.: A filter approach to feature selection based on mutual information. In: 5th IEEE international Conference (2006)
Andreas, G.K. Janecek, A., Gansterer, W.N., Demel, M.A., Ecker, G.F.: On the relationship between feature selection and classification accuracy. In: JMLR: Workshop and Conference Proceedings, vol. 4, pp. 90–105 (2008)
Song, Q., Ni, J., Wang, G.: A fast clustering based feature subset selection algorithm for high dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)
Hall M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of 17th International Conference on Machine Learning, pp. 359–366 (2000)
Hall, M.A.: Correlation based feature selection for machine learning. Thesis, Department of Computer Science, University of Waikato, Hamilton, New Zealand (1999)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of 10th National Conference Artificial Intelligence, pp. 129–134 (1992)
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Proceedings of European Conference Machine Learning, pp. 171–182 (1994)
Almuallim H., Dietterich T.G.,: Algorithms for Identifying Relevant Features, Proc. Ninth Canadian Conf. Artificial Intelligence, pp. 38–45 (1992)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Wang, G., Song, Q., Sun, H., Zhang, X., Xu, B., Zhou, Y.: A feature subset selection algorithm automatic recommendation method. J. Artif. Intell. Res. 47, 1–34 (2013)
Gray, R.M.: Entropy and Information Theory. Springer, New York (1991)
Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24, 301–302 (2002)
Blake, C., Merz: UCI repository of machine learning databases. http://www.ics.uci.edu
Witten, I.H., Frank, E., Hall, M.A., Mining, D.: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)
Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 1st edn. Wiley-Interscience, New York (2007)
Zhao, Z., Morstatter, F., Sharma, S., Alelyani, S., Anand, A., Liu, H.: Advancing feature selection research. Technical report, Arizona State University (2011)
Laiho, P., Kokko, A., Vanharanta, S., Salovaara, R., Sammalkorpi, H., Jarvinen, H., Mecklin, J.P., Karttunen, T.J., Tuppurainen, K., Davalos, V., Schwartz, S., Arango, D., Makinen, M.J., Aaltonen, L.A.: Serrated carcinomas form a subclass of colorectal cancer with distinct molecular basis. Oncogene 26(2), 312–320 (2007)
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P., Gerald, W., Loda, M., Lander, E.S., Golub, T.R.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Nat. Acad Sci. USA 98(26), 15149–15154 (2001)
Golub, T.R., Slonim, D.R., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nagpal, A., Gaur, D. (2015). A New Proposed Feature Subset Selection Algorithm Based on Maximization of Gain Ratio. In: Kumar, N., Bhatnagar, V. (eds) Big Data Analytics. BDA 2015. Lecture Notes in Computer Science(), vol 9498. Springer, Cham. https://doi.org/10.1007/978-3-319-27057-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-27057-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27056-2
Online ISBN: 978-3-319-27057-9
eBook Packages: Computer ScienceComputer Science (R0)