On Determining the Optimal Partition in Agglomerative Clustering of Documents

El Sayed, Ahmad; Hacid, Hakim; Zighed, Djamel

doi:10.1007/978-3-540-68123-6_53

Ahmad El Sayed¹,
Hakim Hacid¹ &
Djamel Zighed¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4994))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

1054 Accesses

Abstract

The goal of any clustering algorithm producing flat partitions of data, is to find both the optimal clustering solution and the optimal number of clusters. One natural way to reach this goal without the need for parameters, is to involve a validity index in a clustering process, which can lead to an objective selection of the optimal number of clusters. In this paper, we provide an evaluation of the major relative indices involving them in an agglomerative clustering algorithm for documents. The evaluation seeks the indices ability to identify both the optimal solution and the optimal number of clusters. Then, we propose a new context-aware method that aims at enhancing the validity indices usage as stopping criteria in agglomerative algorithms. Experimental results show that the method is a step-forward in using, with more reliability, validity indices as stopping criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Measurement of clustering effectiveness for document collections

Article Open access 10 January 2022

SMGKM: An Efficient Incremental Algorithm for Clustering Document Collections

Three Case Studies Using Agglomerative Clustering

References

Bezdek, J.C., Li, W., Attikiouzel, Y., Windham, M.P.: A geometric approach to cluster validity for normal mixtures. Soft Comput. 1(4), 166–179 (1997)
Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1(2) (1979)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. John Wiley & Sons, Chichester (2001)
MATH Google Scholar
Dunn, J.C.: Well separated clusters and optimal fuzzy paritions. Journal Cybern 4, 95–104 (1974)
Article MathSciNet Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: Part i. SIGMOD Record 31(2), 40–45 (2002)
Article Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity checking methods: Part ii. SIGMOD Record 31(3), 19–27 (2002)
Article Google Scholar
Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared near neighbors. IEEE Trans. Comput. 22(11), 1025–1034 (1973)
Article Google Scholar
Krzanowski, W.J., Lai, Y.T.: A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering. Biometrics 44, 23–34 (1988)
Article MATH MathSciNet Google Scholar
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika V50(2), 159–179 (1985)
Article Google Scholar
Raskutti, B., Leckie, C.: An evaluation of criteria for measuring the quality of clusters. In: IJCAI, pp. 905–910 (1999)
Google Scholar
Saitta, S., Raphael, B., Smith, I.F.C.: A bounded index for cluster validity. In: MLDM, pp. 174–187 (2007)
Google Scholar
Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)
Google Scholar
Sergios Theodoridis, K.K.: Pattern recognition. Academic Press, London (1999)
Google Scholar
Sharma, S.: Applied multivariate techniques. John Wiley and Sons, Chichester (1996)
Google Scholar
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics 3, 1–27 (1974)
Article MathSciNet Google Scholar
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning 55(3), 311–331 (2004)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

ERIC Laboratory, University of Lyon, 5, avenue Pierre Mendès, France, 69676, Bron cedex, France
Ahmad El Sayed, Hakim Hacid & Djamel Zighed

Authors

Ahmad El Sayed
View author publications
You can also search for this author in PubMed Google Scholar
Hakim Hacid
View author publications
You can also search for this author in PubMed Google Scholar
Djamel Zighed
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Aijun An Stan Matwin Zbigniew W. Raś Dominik Ślęzak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

El Sayed, A., Hacid, H., Zighed, D. (2008). On Determining the Optimal Partition in Agglomerative Clustering of Documents. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds) Foundations of Intelligent Systems. ISMIS 2008. Lecture Notes in Computer Science(), vol 4994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68123-6_53

Download citation

DOI: https://doi.org/10.1007/978-3-540-68123-6_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68122-9
Online ISBN: 978-3-540-68123-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Determining the Optimal Partition in Agglomerative Clustering of Documents

Abstract

Access this chapter

Preview

Similar content being viewed by others

Measurement of clustering effectiveness for document collections

SMGKM: An Efficient Incremental Algorithm for Clustering Document Collections

Three Case Studies Using Agglomerative Clustering

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On Determining the Optimal Partition in Agglomerative Clustering of Documents

Abstract

Access this chapter

Preview

Similar content being viewed by others

Measurement of clustering effectiveness for document collections

SMGKM: An Efficient Incremental Algorithm for Clustering Document Collections

Three Case Studies Using Agglomerative Clustering

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation