Skip to main content
Log in

Improving spectral clustering with deep embedding, cluster estimation and metric learning

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Spectral clustering is one of the most popular modern clustering algorithms. It is easy to implement, can be solved efficiently, and very often outperforms other traditional clustering algorithms such as k-means. However, spectral clustering could be insufficient when dealing with most datasets having complex statistical properties, and it requires users to specify the number k of clusters and a good distance metric to construct the similarity graph. To address these problems, in this article, we propose an approach to extending spectral clustering with deep embedding, cluster estimation, and metric learning. First, we generate the deep embedding via learning a deep autoencoder, which transforms the raw data into their lower dimensional representations suitable for clustering. Second, we provide an effective method to estimate the number of clusters by learning a softmax autoencoder from the deep embedding. Third, we construct a more powerful similarity graph by learning a distance metric from the embedding using a Siamese network. Finally, we conduct an extensive experimental study on image and text datasets, which verifies the effectiveness and efficiency of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications. CRC Press, Bca Raton

    Book  Google Scholar 

  2. Aljalbout E, Golkov V, Siddiqui Y, Strobel M, Cremers D (2018) Clustering with deep learning: taxonomy and new methods. arXiv:1801.07648v2

  3. Bellman RE (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton

    Book  Google Scholar 

  4. Bengio Y, Yao L, Alain G, Vincent P (2013) Generalized denoising auto-encoders as generative models. In: NIPS

  5. Cai D, He X, Han J (2011) Locally consistent concept factorization for document clustering. IEEE Trans Knowl Data Eng 23(6):902–913

    Article  Google Scholar 

  6. Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3(1):1–27

    MathSciNet  MATH  Google Scholar 

  7. Chaudhuri K, Kakade SM, Livescu K, Sridharan K (2009) Multi-view clustering via canonical correlation analysis. In: ICML

  8. Chen J, Zhao Z, Ye J, Liu H (2007) Nonlinear adaptive distance metric learning for clustering. In: SIGKDD

  9. Chiang MMT, Mirkin B (2010) Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. J Classif 27(1):3–40

    Article  MathSciNet  Google Scholar 

  10. Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: CVPR

  11. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 20:224–227

    Article  Google Scholar 

  12. Dian C, He X (2004) K-means clustering via principal component analysis. In: ICML

  13. Duan L, Aggarwal C, Ma S, Sathe S (2019) Improving specral clustering with deep embedding and cluster estimation. In: ICDM

  14. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    MATH  Google Scholar 

  15. Guo X, Gao L, Liu X, Yin J (2017) Improved deep embedded clustering with local structure preservation. In: IJCAI

  16. Hamerly G, Elkan C (2004) Learning the k in k-means. In: NIPS

  17. Han J, Kamber M, Pei J (2012) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, Burlington

    MATH  Google Scholar 

  18. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507

    Article  MathSciNet  Google Scholar 

  19. Hsu YC, Lv Z, Kira Z (2018) Learning to cluster in order to transfer across domains and tasks. In: ICLR

  20. Huang P, Huang Y, Wang W, Wang L (2014) Deep embedding network for clustering. In: ICPR

  21. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666

    Article  Google Scholar 

  22. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  23. Jiang Z, Zheng Y, Tan H, Tang B, Zhou H (2017) Variational deep embedding: an unsupervised and generative approach to clustering. In: IJCAI

  24. Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: ICML deep learning workshop, vol 2

  25. Kolesnikov A, Trichina E, Kauranne T (2015) Estimating the number of clusters in a numerical data set via quantization error modeling. Pattern Recogn 48(3):941–952

    Article  Google Scholar 

  26. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  27. Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397

    Google Scholar 

  28. Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: NIPS

  29. Nie F, Zeng Z, Tsang IW, Xu D, Zhang C (2011) Spectral embedded clustering: a framework for in-sample and out-of-sample spectral clustering. IEEE Trans Neural Netw 22(11):1796–1808

    Article  Google Scholar 

  30. Patil C, Baidari I (2019) Estimating the optimal number of clusters k in a dataset using data depth. Data Sci Eng 4(2):132–140

    Article  Google Scholar 

  31. Pelleg D, Moore A (2000) X-means: Extending k-means with efficient estimation of the number of clusters. In: ICML

  32. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  Google Scholar 

  33. Shaham U, Lederman RR (2018) Learning by coincidence: Siamese networks and common variable learning. Pattern Recogn 74:52–63

    Article  Google Scholar 

  34. Shaham U, Stanton KP, Li H, Basri R, Nadler B, Kluger Y (2018) Spectralnet: spectral clustering using deep neural networks. In: ICLR

  35. Shahnaz F, Berry MW, Pauca VP, Plemmons RJ (2006) Document clustering using nonnegative matrix factorization. Inf Process Manag 42:373–386

    Article  Google Scholar 

  36. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR

  37. Tian F, Gao B, Cui Q, Chen E, Liu TY (2014) Learning deep representations for graph clustering. In: AAAI

  38. von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  39. Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: ICML, pp 478–487

  40. Yang B, Fu X, Sidiropoulos ND, Hong M (2017) Towards k-means-friendly spaces: simultaneous deep learning and clustering. In: ICML

  41. Ye J, Zhao Z, Liu H (2007) Adaptive distance metric learning for clustering. In: CVPR

  42. Yeung KY, Ruzzo WL (2001) Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics 17(9):763–774

    Article  Google Scholar 

  43. Zheng M, Bu J, Chen C, Wang C, Zhang L, Qiu G, Cai D (2011) Graph regularized sparse coding for image representation. IEEE Trans Image Process 20(5):1327–1336

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is supported in part by National Key R&D Program of China 2018YFB1700403, NSFC 61925203 & U1636210 & 61421003 & U1802271, Science Foundation for Distinguished Young Scholars of Yunnan Province 2019FJ011 and China Postdoctoral Science Foundation 2020M673310.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuai Ma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duan, L., Ma, S., Aggarwal, C. et al. Improving spectral clustering with deep embedding, cluster estimation and metric learning. Knowl Inf Syst 63, 675–694 (2021). https://doi.org/10.1007/s10115-020-01530-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-020-01530-8

Keywords

Navigation