Abstract
This paper deals with the problem of supervised wrapper-based feature subset selection in datasets with a very large number of attributes. In such datasets sophisticated search algorithms like beam search, branch and bound, best first, genetic algorithms, etc., become intractable in the wrapper approach due to the high number of wrapper evaluations to be carried out. Thus, recently we can find in the literature the use of hybrid selection algorithms: based on a filter ranking, they perform an incremental wrapper selection over that ranking. Though working fine, these methods still have their own problems: (1) depending on the complexity of the wrapper search method, the number of wrapper evaluations can still be too large; and (2) they rely in an univariate ranking that does not take into account interaction between the variables already included in the selected subset and the remaining ones. In this paper we propose to work incrementally in two levels (block-level and attribute-level) in order to use a filter re-ranking method based on conditional mutual information, and the results show that we drastically reduce the number of wrapper evaluations without degrading the quality of the obtained subset (in fact we get the same accuracy but reducing the number of selected attributes).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bermejo, P., Gámez, J.A., Puerta, J.M.: On incremental wrapper-based attribute selection: experimental analysis of the relevance criteria. In: IPMU’08: Proceedings of the 12th Intl. Conf. on Information Processing and Management of Uncertainty in Knowledge-Based Systems (2008)
Bermejo, P., Gámez, J.A., Puerta, J.M.: Incremental wrapper-based subset selection with replacement: An advantageous alternative to sequential forward selection. In: Proceedings of the IEEE Symposium Series on Computational Intelligence and Data Mining (SSCI CIDM-2009), pp. 367–374 (2009)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Fleuret, F., Guyon, I.: Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research 5, 1531–1555 (2004)
Flores, J., Gámez, J.A.: Breeding value classification in manchego sheep: a study of attribute selection and construction. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds.) KES 2005. LNCS (LNAI), vol. 3682, pp. 1338–1346. Springer, Heidelberg (2005)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Gutlein, M., Frank, E., Hall, M., Karwath, A.: Large-scale attribute selection using wrappers. In: CIDM, pp. 332–339 (2009)
Liu, H., Motoda, H.: Feature Extraction Construction and Selection: a data mining perspective. Kluwer Academic Publishers, Dordrecht (1998)
Ruiz, R., Aguilar, J.S., Riquelme, J.: Best agglomerative ranked subset for feature selection. In: JMLR: Workshop and Conference Proceedings New Challenges for feature selection, vol. 4, pp. 148–162 (2009)
Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S.: Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recogn. 39, 2383–2392 (2006)
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bulletin 1, 80–83 (1945)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bermejo, P., Gámez, J.A., Puerta, J.M. (2010). Improving Incremental Wrapper-Based Feature Subset Selection by Using Re-ranking. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13022-9_58
Download citation
DOI: https://doi.org/10.1007/978-3-642-13022-9_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13021-2
Online ISBN: 978-3-642-13022-9
eBook Packages: Computer ScienceComputer Science (R0)