Abstract
The goal of any clustering algorithm producing flat partitions of data, is to find both the optimal clustering solution and the optimal number of clusters. One natural way to reach this goal without the need for parameters, is to involve a validity index in a clustering process, which can lead to an objective selection of the optimal number of clusters. In this paper, we provide an evaluation of the major relative indices involving them in an agglomerative clustering algorithm for documents. The evaluation seeks the indices ability to identify both the optimal solution and the optimal number of clusters. Then, we propose a new context-aware method that aims at enhancing the validity indices usage as stopping criteria in agglomerative algorithms. Experimental results show that the method is a step-forward in using, with more reliability, validity indices as stopping criteria.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bezdek, J.C., Li, W., Attikiouzel, Y., Windham, M.P.: A geometric approach to cluster validity for normal mixtures. Soft Comput. 1(4), 166–179 (1997)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1(2) (1979)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. John Wiley & Sons, Chichester (2001)
Dunn, J.C.: Well separated clusters and optimal fuzzy paritions. Journal Cybern 4, 95–104 (1974)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: Part i. SIGMOD Record 31(2), 40–45 (2002)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity checking methods: Part ii. SIGMOD Record 31(3), 19–27 (2002)
Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared near neighbors. IEEE Trans. Comput. 22(11), 1025–1034 (1973)
Krzanowski, W.J., Lai, Y.T.: A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering. Biometrics 44, 23–34 (1988)
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika V50(2), 159–179 (1985)
Raskutti, B., Leckie, C.: An evaluation of criteria for measuring the quality of clusters. In: IJCAI, pp. 905–910 (1999)
Saitta, S., Raphael, B., Smith, I.F.C.: A bounded index for cluster validity. In: MLDM, pp. 174–187 (2007)
Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)
Sergios Theodoridis, K.K.: Pattern recognition. Academic Press, London (1999)
Sharma, S.: Applied multivariate techniques. John Wiley and Sons, Chichester (1996)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics 3, 1–27 (1974)
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning 55(3), 311–331 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
El Sayed, A., Hacid, H., Zighed, D. (2008). On Determining the Optimal Partition in Agglomerative Clustering of Documents. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds) Foundations of Intelligent Systems. ISMIS 2008. Lecture Notes in Computer Science(), vol 4994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68123-6_53
Download citation
DOI: https://doi.org/10.1007/978-3-540-68123-6_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68122-9
Online ISBN: 978-3-540-68123-6
eBook Packages: Computer ScienceComputer Science (R0)