Improving spectral clustering with deep embedding, cluster estimation and metric learning

Duan, Liang; Ma, Shuai; Aggarwal, Charu; Sathe, Saket

doi:10.1007/s10115-020-01530-8

Improving spectral clustering with deep embedding, cluster estimation and metric learning

Regular Paper
Published: 22 November 2020

Volume 63, pages 675–694, (2021)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Liang Duan¹,
Shuai Ma^2,3,
Charu Aggarwal⁴ &
…
Saket Sathe⁴

895 Accesses
8 Citations
Explore all metrics

Abstract

Spectral clustering is one of the most popular modern clustering algorithms. It is easy to implement, can be solved efficiently, and very often outperforms other traditional clustering algorithms such as k-means. However, spectral clustering could be insufficient when dealing with most datasets having complex statistical properties, and it requires users to specify the number k of clusters and a good distance metric to construct the similarity graph. To address these problems, in this article, we propose an approach to extending spectral clustering with deep embedding, cluster estimation, and metric learning. First, we generate the deep embedding via learning a deep autoencoder, which transforms the raw data into their lower dimensional representations suitable for clustering. Second, we provide an effective method to estimate the number of clusters by learning a softmax autoencoder from the deep embedding. Third, we construct a more powerful similarity graph by learning a distance metric from the embedding using a Siamese network. Finally, we conduct an extensive experimental study on image and text datasets, which verifies the effectiveness and efficiency of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transferable Deep Metric Learning for Clustering

Exploring Implicit and Explicit Geometrical Structure of Data for Deep Embedded Clustering

Article 19 October 2020

Maintaining Consistency with Constraints: A Constrained Deep Clustering Method

References

Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications. CRC Press, Bca Raton
Book Google Scholar
Aljalbout E, Golkov V, Siddiqui Y, Strobel M, Cremers D (2018) Clustering with deep learning: taxonomy and new methods. arXiv:1801.07648v2
Bellman RE (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton
Book Google Scholar
Bengio Y, Yao L, Alain G, Vincent P (2013) Generalized denoising auto-encoders as generative models. In: NIPS
Cai D, He X, Han J (2011) Locally consistent concept factorization for document clustering. IEEE Trans Knowl Data Eng 23(6):902–913
Article Google Scholar
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3(1):1–27
MathSciNet MATH Google Scholar
Chaudhuri K, Kakade SM, Livescu K, Sridharan K (2009) Multi-view clustering via canonical correlation analysis. In: ICML
Chen J, Zhao Z, Ye J, Liu H (2007) Nonlinear adaptive distance metric learning for clustering. In: SIGKDD
Chiang MMT, Mirkin B (2010) Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. J Classif 27(1):3–40
Article MathSciNet Google Scholar
Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: CVPR
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 20:224–227
Article Google Scholar
Dian C, He X (2004) K-means clustering via principal component analysis. In: ICML
Duan L, Aggarwal C, Ma S, Sathe S (2019) Improving specral clustering with deep embedding and cluster estimation. In: ICDM
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
MATH Google Scholar
Guo X, Gao L, Liu X, Yin J (2017) Improved deep embedded clustering with local structure preservation. In: IJCAI
Hamerly G, Elkan C (2004) Learning the k in k-means. In: NIPS
Han J, Kamber M, Pei J (2012) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, Burlington
MATH Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507
Article MathSciNet Google Scholar
Hsu YC, Lv Z, Kira Z (2018) Learning to cluster in order to transfer across domains and tasks. In: ICLR
Huang P, Huang Y, Wang W, Wang L (2014) Deep embedding network for clustering. In: ICPR
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Article Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Article Google Scholar
Jiang Z, Zheng Y, Tan H, Tang B, Zhou H (2017) Variational deep embedding: an unsupervised and generative approach to clustering. In: IJCAI
Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: ICML deep learning workshop, vol 2
Kolesnikov A, Trichina E, Kauranne T (2015) Estimating the number of clusters in a numerical data set via quantization error modeling. Pattern Recogn 48(3):941–952
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397
Google Scholar
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: NIPS
Nie F, Zeng Z, Tsang IW, Xu D, Zhang C (2011) Spectral embedded clustering: a framework for in-sample and out-of-sample spectral clustering. IEEE Trans Neural Netw 22(11):1796–1808
Article Google Scholar
Patil C, Baidari I (2019) Estimating the optimal number of clusters k in a dataset using data depth. Data Sci Eng 4(2):132–140
Article Google Scholar
Pelleg D, Moore A (2000) X-means: Extending k-means with efficient estimation of the number of clusters. In: ICML
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article Google Scholar
Shaham U, Lederman RR (2018) Learning by coincidence: Siamese networks and common variable learning. Pattern Recogn 74:52–63
Article Google Scholar
Shaham U, Stanton KP, Li H, Basri R, Nadler B, Kluger Y (2018) Spectralnet: spectral clustering using deep neural networks. In: ICLR
Shahnaz F, Berry MW, Pauca VP, Plemmons RJ (2006) Document clustering using nonnegative matrix factorization. Inf Process Manag 42:373–386
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR
Tian F, Gao B, Cui Q, Chen E, Liu TY (2014) Learning deep representations for graph clustering. In: AAAI
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Article MathSciNet Google Scholar
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: ICML, pp 478–487
Yang B, Fu X, Sidiropoulos ND, Hong M (2017) Towards k-means-friendly spaces: simultaneous deep learning and clustering. In: ICML
Ye J, Zhao Z, Liu H (2007) Adaptive distance metric learning for clustering. In: CVPR
Yeung KY, Ruzzo WL (2001) Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics 17(9):763–774
Article Google Scholar
Zheng M, Bu J, Chen C, Wang C, Zhang L, Qiu G, Cai D (2011) Graph regularized sparse coding for image representation. IEEE Trans Image Process 20(5):1327–1336
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported in part by National Key R&D Program of China 2018YFB1700403, NSFC 61925203 & U1636210 & 61421003 & U1802271, Science Foundation for Distinguished Young Scholars of Yunnan Province 2019FJ011 and China Postdoctoral Science Foundation 2020M673310.

Author information

Authors and Affiliations

School of Information Science and Engineering, Yunnan University, Kunming, China
Liang Duan
SKLSDE Lab, Beihang University, Beijing, China
Shuai Ma
Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing, China
Shuai Ma
IBM T. J. Watson Research Center, New York, USA
Charu Aggarwal & Saket Sathe

Authors

Liang Duan
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Ma
View author publications
You can also search for this author in PubMed Google Scholar
Charu Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Saket Sathe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuai Ma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Duan, L., Ma, S., Aggarwal, C. et al. Improving spectral clustering with deep embedding, cluster estimation and metric learning. Knowl Inf Syst 63, 675–694 (2021). https://doi.org/10.1007/s10115-020-01530-8

Download citation

Received: 27 January 2020
Revised: 26 October 2020
Accepted: 01 November 2020
Published: 22 November 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10115-020-01530-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving spectral clustering with deep embedding, cluster estimation and metric learning

Abstract

Access this article

Similar content being viewed by others

Transferable Deep Metric Learning for Clustering

Exploring Implicit and Explicit Geometrical Structure of Data for Deep Embedded Clustering

Maintaining Consistency with Constraints: A Constrained Deep Clustering Method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving spectral clustering with deep embedding, cluster estimation and metric learning

Abstract

Access this article

Similar content being viewed by others

Transferable Deep Metric Learning for Clustering

Exploring Implicit and Explicit Geometrical Structure of Data for Deep Embedded Clustering

Maintaining Consistency with Constraints: A Constrained Deep Clustering Method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation