Skip to main content
Log in

Non-unique cluster numbers determination methods based on stability in spectral clustering

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Recently, a large amount of work has been devoted to the study of spectral clustering—a simple yet powerful method for finding structure in a data set using spectral properties of an associated pairwise similarity matrix. Most of the existing spectral clustering algorithms estimate only one cluster number or estimate non-unique cluster numbers based on eigengap criterion. However, the number of clusters not always exists one, and eigengap criterion lacks theoretical justification. In this paper, we propose non-unique cluster numbers determination methods based on stability in spectral clustering (NCNDBS). We first utilize the multiway normalized cut spectral clustering algorithm to cluster data set for a candidate cluster number \(k\). Then the ratio value of the multiway normalized cut criterion of the obtained clusters and the sum of the leading eigenvalues (descending sort) of the stochastic transition matrix is chosen as a standard to decide whether the \(k\) is a reasonable cluster number. At last, by varying the scaling parameter in the Gaussian function, we judge whether the reasonable cluster number \(k\) is also a stability one. By three stages, we can determine non-unique cluster numbers of a data set. The Lumpability theorem concluded by Meil\(\breve{a}\) and Xu provides a theoretical base for our methods. NCNDBS can estimate non-unique cluster numbers of the data set successfully by illustrative experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Azran A, Ghahramani Z (2006) Spectral methods for automatic multiscale data clustering. In: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society, Washington

  2. Chen Y, Rege M, Dong M, Hua J (2008) Non-negative matrix factorization for semi-supervised data clustering. Knowl Inf Syst 17:355–379

    Article  Google Scholar 

  3. Climescu-Haulica A (2007) How to choose the number of clusters: the Cramer multiplicity solution. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 15–22

    Google Scholar 

  4. Li T (2008) Clustering based on matrix spproximation: a unifying view. Knowl Inf Syst 17:1–15

    Article  MATH  Google Scholar 

  5. Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  6. Meil\(\breve{a}\) M, Shi J, (2000) Learning segmentation by random walks. In: Todd L, Thomas D, Volker T (eds) Neural information processing systems, Denver, USA, December 2000. Advances in neural information processing systems. MIT Press, Cambridge, pp 873–879

  7. Meil\(\breve{a}\) M, Xu L (2003) Multiway cuts and spectral clustering. Technical Report 442: University of Washington.

  8. Milligan G, Cooper M (1985) An examination of procedures for determining number of clusters in a data set. Psychometrika 50:159–179

    Article  Google Scholar 

  9. Nagai A (2007) Inappropriateness of the criterion of K-way normalized cuts for deciding the number of clusters. Pattern Recognit Lett 28:1981–1986

    Article  Google Scholar 

  10. Nascimento M, Carvalho A (2011) Spectral methods for graph clustering: a survey. Eur J Oper Res 211(2):221–231

    Article  MATH  Google Scholar 

  11. Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Dietterich TG, Becker S, Ghahramani Z (eds) Neural information processing systems, Denver, USA, (December 2001). Advances in neural information processing systems. MIT Press, Cambridge, pp 849–856

    Google Scholar 

  12. Sanguinetti G, Laidler J, Lawrence N (2005) A probabilistic approach to spectral clustering: using KL divergence to find good clusters. Statistics and Optimization of Clustering Workshop, London

    Google Scholar 

  13. Sanguinetti G, Laidler J, Lawrence N (2005) Automatic determination of the number of clusters using spectral algorithms. Mystic, Proceedings of machine learning for signal processing

  14. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  15. Stewart G, Sun J (2001) Matrix perturbation theory, 2nd edn. Academic, New York

    Google Scholar 

  16. Sumuya Guo C, Zang Y (2011) Cluster number estimation based on normalized cut criterion in spectral clustering. ICIC Express Lett 5(1):155–161

    Google Scholar 

  17. Takacs B, Demiris Y (2010) Spectral clustering in multi-agent systems. Knowl Inf Syst 25:607–622

    Article  Google Scholar 

  18. Tepper M, Musé P, Almansa A, Mejail M (2011) Automatically finding clusters in normalized cuts. Pattern Recognit 44:1372–1386

    Article  MATH  Google Scholar 

  19. Tian Z, Li X, Ju Y (2007) Spectral clustering based on matrix perturbation theory. Sci China Ser F Inf Sci 50(1):63–81

    Article  MathSciNet  MATH  Google Scholar 

  20. Wang C, Li W, Ding L, Tian J, Chen S (2005) Image segmentation using spectral clustering. Proceedings of the 17th IEEE international conference on tools with artificial intelligence. IEEE Computer Society, Washington, pp 677–678

  21. Xiang T, Gong S (2008) Spectral clustering with eigenvector selection. Pattern Recognit 41(3):1012–1029

    Article  MATH  Google Scholar 

  22. Xu R, Donald W (2009) Clustering. Wiley, New Jersey

    Google Scholar 

  23. Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In: Lawrence S, Yair W, Léon B (eds) Neural information processing systems, Vancouver, Canada, December 2004. Advances in neural information processing systems. MIT Press, Cambridge, pp 1601–1608

    Google Scholar 

  24. Zhang X, You Q (2011) An improved spectral clustering algorithm based on random walk. Front Comput Sci China 5(3):268–278

    Article  MathSciNet  MATH  Google Scholar 

  25. Zheng X, Lin X (2004) Automatic determination of intrinsic cluster number family in spectral clustering using random walk on graph. International conference on image processing, Singapore, October 2004. IEEE Computer Society, Singapore, pp 3471–3474

Download references

Acknowledgments

The authors are grateful to the reviewers for these precious comments and suggestions, which led to an improved version of the paper. This work was partly supported by the Natural Science Foundation of China (No. 71171030), the Program for New Century Excellent Talents in University (No. NCET-11-0050) and the Program of Higher-level talents of Inner Mongolia University (No. SPH-IMU-125116).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chonghui Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Borjigin, S., Guo, C. Non-unique cluster numbers determination methods based on stability in spectral clustering. Knowl Inf Syst 36, 439–458 (2013). https://doi.org/10.1007/s10115-012-0547-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0547-0

Keywords

Navigation