Abstract
This paper proposes a kNN model-based feature selection method aimed at improving the efficiency and effectiveness of the ReliefF method by: (1) using a kNN model as the starter selection, aimed at choosing a set of more meaningful representatives to replace the original data for feature selection; (2) integration of the Heterogeneous Value Difference Metric to handle heterogeneous applications – those with both ordinal and nominal features; and (3) presenting a simple method of difference function calculation based on inductive information in each representative obtained bykNN model. We have evaluated the performance of the proposed kNN model-based feature selection method on toxicity dataset Phenols with two different endpoints. Experimental results indicate that the proposed feature selection method has a significant improvement in the classification accuracy for the trial dataset.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Cronin, M.T.D., Aptula, A.O., Duffy, J.C., et al.: Comparative Assessment of Methods to Develop QSARs for the Prediction of the Toxicity of Phenols to Tetrahymena Pyriformis. Chemosphere 49, 1201–1221 (2002)
Fayyad, U.M., Irani, K.B.: The Attribute Selection Problem in Decision Tree Generation. In: Proc. of AAAI 1992, the 9th National Conference on Artificial Intelligence, pp. 104–110. AAAI Press/The MIT Press (1992)
Guo, G., Wang, H., Bell, D., et al.: kNN Model-based Approach in Classification. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 986–996. Springer, Heidelberg (2003)
Hall, M.A.: Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. In: Proc. of ICML 2000, the 17th International Conference on Machine Learning, pp. 359–366 (2000)
Huang, Y., McCullagh, P.J., Black, N.D.: Feature Selection via Supervised Model Construction. In: Proc. of the Fourth IEEE International Conference on Data Mining, pp. 411–414 (2004)
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant Feature and the Subset Selection Problem. In: Cohen, W.W., Hirsh, H. (eds.) Machine Learning: Proc. of the Eleventh International Conference, New Brunswick, N.J., Rutgers University, pp. 121–129 (1994)
Kira, K., Rendell, L.A.: A Practical Approach to Feature Selection. Machine Learning, 249–256 (1992)
Kononenko, I.: Estimating attributes: Analysis and Extension of Relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Liu, H., Yu, L., Dash, M., Motoda, H.: Active Feature Selection Using Classes. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 474–485. Springer, Heidelberg (2003)
Robnik, M., Kononenko, I.: Machine Learning, vol. 53, pp. 23–69. Kluwer Academic Publishers, Dordrecht (2003)
Sikonja, M.R., Kononenko, I.: Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning Journal 53, 23–69 (2003)
Søndberg-Madsen, N., Thomsen, C., Peña, J.M.: Unsupervised Feature Subset Selection. In: Proc. of the Workshop on Probabilistic Graphical Models for Classification (within ECML/PKDD 2003), pp. 71–82 (2003)
Scheultz, T.W.: TETRATOX: The Tetrahymena Pyriformis Population Growth Impairment Endpoint – A Surrogate for Fish Lethality. Toxicol. Methods 7, 289–309 (1997)
Schultz, T.W., Sinks, G.D., Cronin, M.T.D.: Identification of Mechanisms of Toxic Action of Phenols to Tetrahymena Pyriformis from Molecular Descriptors. In: Chen, F., Schuurmann, G. (eds.) Quantitative Structure-Activity Relationships in Environmental Sciences – VII, pp. 329–342. SETAC Press, Presacola (1997)
Wilson, D.R., Martinez, T.R.: Improved Heterogeneous Distance Functions. Journal of Artificial Intelligence Research (JAIR) 6-1, 1–34 (1997)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco (2000)
Biesiada, J., Duch, W.: Feature Selection for High-Dimensional Data: A Kolmogorov-Smirnov Correlation-based Filter. In: Proc. of CORES 2005, the 4th International Conference on Computer Recognition Systems (2005)
Sebban, M., Nock, R.: A Hybrid Filter/Wrapper Approach of Feature Selection Using Information Theory. Pattern Recognition 35(4), 835–846 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guo, G., Neagu, D., Cronin, M.T.D. (2005). Using kNN Model for Automatic Feature Selection. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds) Pattern Recognition and Data Mining. ICAPR 2005. Lecture Notes in Computer Science, vol 3686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551188_44
Download citation
DOI: https://doi.org/10.1007/11551188_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28757-5
Online ISBN: 978-3-540-28758-2
eBook Packages: Computer ScienceComputer Science (R0)