Abstract
Automatically determining the number of clusters is an important issue in cluster analysis. In this paper, we explore “trial-and-error” approach to determining the number of clusters in a given data set. The fuzzy clustering algorithm, FCM, is selected as the basic “trial” algorithm and cluster validity optimization responses to the “error” procedure. To improve the computation speed, we propose two strategies, eliminating and splitting, which allow the FCM-based algorithms more efficient. To improve existing validity measures, we make use of a new validity function that fits particularly data sets containing overlapping clusters. Experimental results are given to illustrate the performance of the new algorithms.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
Rezae, M., Letlieveldt, B., Reiber, J.: A new cluster validity index for the fuzzy c-means. Pattern Recognition Letters 19, 237–246 (1998)
Rhee, H., Oh, K.: A Validity Measure for Fuzzy Clustering and Its Use in Selecting Optimal Number of Clusters. Proceedings of IEEE, 1020–1025 (1996)
Bezdek, J.: Fuzzy mathematics in pattern classification. Ph.D. Dissertation, Cornell University (1973)
Krishnapuram, R., Keller, J.: A possibilistic approach to clustering. Fuzzy Systems 1, 98–109 (1993)
Sun, H., Wang, S., Jiang, Q.: A new validation index for determining the number of clusters in a data set. In: Proceedings of IJCNN, Washington, DC, USA, July 2001, pp. 1852–1857 (2001)
Pena, J., Lozano, J., Larranaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters 20, 1027–1040 (1999)
Gonzalez, T.: Clustering to Minimize and Maximum Intercluster Distance. Theoretical Computer Science 38, 293–306 (1985)
Bezdek, J.: Chapter F6: Pattern Recognition. In: Handbook of Fuzzy Computation. IOP Publishing Ltd. (1998)
Pal, N., Bezdek, J.: On Cluster Validity for the Fuzzy C-Means Model. IEEE Trans. on Fuzzy Systems 3(3), 370–390 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sun, H., Sun, M. (2006). Trail-and-Error Approach for Determining the Number of Clusters. In: Yeung, D.S., Liu, ZQ., Wang, XZ., Yan, H. (eds) Advances in Machine Learning and Cybernetics. Lecture Notes in Computer Science(), vol 3930. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11739685_24
Download citation
DOI: https://doi.org/10.1007/11739685_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33584-9
Online ISBN: 978-3-540-33585-6
eBook Packages: Computer ScienceComputer Science (R0)