Abstract
Wrapper and filter are two commonly used feature selection schemes. Because of its computational efficiency, the filter method is often the first choice when dealing with large dataset. However, most of filter methods reported in the literature are developed for continuous feature selection. In this paper, we proposed a filter method for mixed data with both continuous and nominal features. The new algorithm includes a novel criterion for mixed feature evaluation, and a novel search algorithm for mixed feature subset generation. The proposed method is tested using a few benchmark real-world problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bar-Hen, A., Daudin, J.J.: Generalization of Mahalanobis Distance in Mixed Case. Journal of Multivariate Analysis 53, 332–342 (1995)
Devijver, P.A., Kittler, J.: Pattern Recognition, A Statistical Approach. Prentice-Hall International, Inc., London (1982)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience Publication, Hoboken (2001)
Kononenko, I.: Estimating Attributes: Analysis and Extensions of RELIEF. In: ECML, pp. 171–182 (1994)
Robnik, M., Kononenko, I.: Theoretical and Empirical Analysis of ReliefF and ReliefF. Machine Learning 53, 23–26 (2003)
Hall, M.A.: Correlation-based Feature Selection for Machine Learning. A Dissertation submit to Department of Computer Science, University of Waikato, Hamilton, NewZealand (1999)
Hall, M.A.: Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. In: ICML, pp. 359–366 (2000)
Molina, L.C., Belanche, L., Nebot, A.: Feature Selection Algorithms: A Survey and Experimental Evaluation. In: ICDM, pp. 306–313 (2002)
Murphy, P.M., Aha, D.W.: UCI repository of machine learning databases (1995), http://www.ics.uci.edu/~mlearn/MLRepository.html
Florez-Lopez, R.: Reviewing RELIEF and its extensions: a new approach for estimating attributes considering high-correlated features. In: IEEE International Conference on Data Mining, pp. 605–608 (2002)
Mitchell, T.M.: Machine Learning. The McGraw-Hill Companies, Inc., New York (2001)
Wilson, D.R., Martinez, T.R.: Improved Heterogeneous Distance Functions. Journal of Artificial Intelligence Research 6, 1–34 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tang, W., Mao, K. (2005). Feature Selection Algorithm for Data with Both Nominal and Continuous Features. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_78
Download citation
DOI: https://doi.org/10.1007/11430919_78
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26076-9
Online ISBN: 978-3-540-31935-1
eBook Packages: Computer ScienceComputer Science (R0)