Abstract
Gaining confidence that a clustering algorithm has produced meaningful results and not an accident of its usually heuristic optimization is central to data analysis. This is the issue of validity and we propose here a method by which Support Vector Machines are used to evaluate the separation in the clustering results. However, we not only obtain a method to compare clustering results from different algorithms or different runs of the same algorithm, but we can also filter noise and outliers. Thus, for a fixed data set we can identify what is the most robust and potentially meaningful clustering result. A set of experiments illustrates the steps of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bennett, K.P., Campbell, C.: Support vector machines: Hype or hallelujah. SIGKDD Explorations 2(2), 1–13 (2000)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chang, C.C., Lin, C.J.: Training v-support vector classifiers: Theory and algorithms. Neural Computation 13(9), 2119–2147 (2001)
Cherkassky, V., Muller, F.: Learning from Data — Concept, Theory and Methods. Wiley, New York (1998)
Dubes, R.C.: Cluster analysis and related issues. In: Chen, C.H., Pau, L.F., Wang, P.S.P. (eds.) Handbook of Pattern Recognition and Computer Vision, ch. 1.1, pp. 3–32. World Scientific, Singapore (1993)
Estivill-Castro, V.: Why so many clustering algorithms - a position paper. SIGKDD Explorations 4(1), 65–75 (2002)
Gokcay, E., Principe, J.: A new clustering evaluation function using Renyi’s information potential. In: Wells, R.O., Tian, J., Baraniuk, R.G., Tan, D.M., Wu, H.R. (eds.) Proc. of IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP 2000), Istanbul, pp. 3490–3493 (2000)
Gunn, S.: Support vector machines for classification and regression. Tech. Report ISIS-1-98, Univ. of Southampton, Dept. of Electronics and Computer Science (1998)
Haykin, S.S.: Neural networks: a comprehensive foundation. PrenticeHall, Englewood Cliffs (1999)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. PrenticeHall, Englewood Cliffs (1998)
Koschke, R., Eisenbarth, T.: A framework for experimental evaluation of clustering techniques. In: Proc. Int. Workshop on Program Comprehension (2000)
Rauber, A., Paralic, J., Pampalk, E.: Empirical evaluation of clustering algorithms. Malekovic, M., Lorencic, A. (eds.), 11th Int. Conf. Information and Intelligent Systems (IIS 2002), Varazdin, Croatia, September 20 - 22, Univ. of Zagreb (2000)
Schölkopf, B., Williamson, R.C., Smola, A.J., Shawe-Taylor, J.: SV estimation of a distribution’s support. In: Leen, T.K., Solla, S.A., Müller, K.R. (eds.) Advances in Neural Information Processing Systems 12. MIT Press, Cambridge (forthcomming), mlg.anu.edu.au/smola/publications.html
Siegelmann, H., Ben-Hur, A., Horn, D., Vapnik, V.: Support vector clustering. J. Machine Learning Research 2, 125–137 (2001)
Vapnik, V.N.: The nature of statistical learning theory. Springer, Heidelberg (1995)
Vazirgiannis, M., Halkidi, M., Batistakis, Y.: On clustering validation techniques. Intelligent Information Systems J. 17(2), 107–145 (2001)
Williamson, R., Schölkopf, B., Smola, A., Bartlett, P.: New support vector algorithms. Neural Computation 12(5), 1207–1245 (2000)
Winter, R.: Formal validation of schema clustering for large information systems. In: Proc. First American Conference on Information Systems (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Estivill-Castro, V., Yang, J. (2003). Cluster Validity Using Support Vector Machines. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2003. Lecture Notes in Computer Science, vol 2737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45228-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-45228-7_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40807-9
Online ISBN: 978-3-540-45228-7
eBook Packages: Springer Book Archive