Abstract
Assessing the similarity between objects is a prerequisite for many data mining techniques. This paper introduces a novel approach to learn distance functions that maximizes the clustering of objects belonging to the same class. Objects belonging to a data set are clustered with respect to a given distance function and the local class density information of each cluster is then used by a weight adjustment heuristic to modify the distance function so that the class density is increased in the attribute space. This process of interleaving clustering with distance function modification is repeated until a “good” distance function has been found. We implemented our approach using the k-means clustering algorithm. We evaluated our approach using 7 UCI data sets for a traditional 1-nearest-neighbor (1-NN) classifier and a compressed 1-NN classifier, called NCC, that uses the learnt distance function and cluster centroids instead of all the points of a training set. The experimental results show that attribute weighting leads to statistically significant improvements in prediction accuracy over a traditional 1-NN classifier for 2 of the 7 data sets tested, whereas using NCC significantly improves the accuracy of the 1-NN classifier for 4 of the 7 data sets.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning Distance Functions Using Equivalence Relations. In: Proc. ICML 2003, Washington D.C. (2003)
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases, Irvine, CA. University of California, Department of Information and Computer Science (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Eick, C., Zeidat, N.: Using Supervised Clustering to Enhance Classifiers. In: Hacid, M.-S., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds.) ISMIS 2005. LNCS (LNAI), vol. 3488, pp. 248–256. Springer, Heidelberg (2005)
Han, E.H., Karypis, G., Kumar, V.: Text Categorization Using Weight Adjusted nearest-neighbor Classification. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, p. 53. Springer, Heidelberg (2001)
Hastie, T., Tibshirani, R.: Disriminant Adaptive Nearest-Neighbor Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 18, 607–616 (1996)
Klein, D., Kamvar, S.-D., Manning, C.: From instance-level Constraints to Space-level Constraints: Making the Most of Prior Knowledge in Data Clustering. In: Proc. ICML 2002, Sydney, Australia (2002)
Kira, K., Rendell, L.: A practical Approach to Feature Selection. In: Proc. 9th Int. Conf. on Machine Learning (1992)
MacQueen, J.: Some methods for classification and analysis of multi-variate observations. In: Proc. 5th Berkeley Symposium Math., Stat., Prob., vol. 1, pp. 281–297 (1967)
Salzberg, S.: A nearest Hyperrectangle Learning Method, Machine Learning (1991)
Stein, B., Niggemann, O.: Generation of Similarity Measures from Different Sources. In: Monostori, L., Váncza, J., Ali, M. (eds.) IEA/AIE 2001. LNCS (LNAI), vol. 2070, p. 197. Springer, Heidelberg (2001)
Witten, I., Eibe, F.: Data Mining: Practical machine learning tools with Java implementations. In: Witten, I.H., Frank, E. (eds.). Morgan Kaufmann, San Francisco (2000)
Xing, E.P., Ng, A., Jordan, M., Russell, S.: Distance Metric Learning with Applications to Clustering with Side Information. In: Advances in Neural Information Processing 15. MIT Press, Cambridge (2003)
Zhang, Z.: Learning Metrics via Discriminant Kernels and Multi-Dimensional Scaling: Toward Expected Euclidean Representation. In: Proc. ICML 2003, Washington D.C. (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Eick, C.F., Rouhana, A., Bagherjeiran, A., Vilalta, R. (2005). Using Clustering to Learn Distance Functions for Supervised Similarity Assessment. In: Perner, P., Imiya, A. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2005. Lecture Notes in Computer Science(), vol 3587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11510888_13
Download citation
DOI: https://doi.org/10.1007/11510888_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26923-6
Online ISBN: 978-3-540-31891-0
eBook Packages: Computer ScienceComputer Science (R0)