Skip to main content
Log in

An autoencoder-based spectral clustering algorithm

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Spectral clustering algorithm suffers from high computational complexity due to the eigen decomposition of Laplacian matrix and large similarity matrix for large-scale datasets. Some researches explore the possibility of deep learning in spectral clustering and propose to replace the eigen decomposition with autoencoder. K-means clustering is generally used to obtain clustering results on the embedding representation, which can improve efficiency but further increase memory consumption. An efficient spectral algorithm based on stacked autoencoder is proposed to solve this issue. In this paper, we select the representative data points as landmarks and use the similarity of landmarks with all data points as the input of autoencoder instead of similarity matrix of the whole datasets. To further refine clustering result, we combine learning the embedding representation and performing clustering. Clustering loss is used to update the parameters of autoencoder and cluster centers simultaneously. The reconstruction loss is also included to prevent the distortion of embedding space and preserve the local structure of data. Experiments on several large-scale datasets validate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html.

  2. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

  3. http://yann.lecun.com/exdb/mnist/index.html.

  4. http://archive.ics.uci.edu/ml/datasets/Letter+Recognition.

  5. http://archive.ics.uci.edu/ml/datasets/Covertype.

  6. http://alumni.cs.ucsb.edu/~wychen/.

  7. https://github.com/mli/nystrom.

  8. https://github.com/piiswrong/dec.

  9. http://www.cad.zju.edu.cn/home/dengcai/Data/ReproduceExp.html.

References

  • Bengio Y, Courville AC, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  • Bouneffouf D, Birol I (2015) Sampling with minimum sum of squared similarities for Nystr\(\ddot{o}\)m-based large scale spectral clustering. In: Proceedings of the 24th international joint conference on artificial intelligence, Buenos Aires, Argentina, AAAI Press, pp 2313–2319

  • Cai D, Chen X (2015) Large scale spectral clustering via landmark-based sparse representation. IEEE Trans Cybern 45(8):1669–1680

    Article  Google Scholar 

  • Chen X, Cai D (2011) Large scale spectral clustering with landmark-based representation. In: Proceedings of the 25th AAAI conference on artificial intelligence, San Francisco, California, USA, AAAI Press, pp 313–318

  • Chen Y, Celikyilmaz A, Hakkani-Tur D (2017) Deep learning for dialogue systems. In: Proceedings of the 55th annual meeting of the association for computational linguistics, Vancouver, Canada, Association for Computational Linguistics, pp 8–14

  • Der Maaten LV, Hinton GE (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  • Fowlkes CC, Belongie SJ, Chung FRK, Malik J (2004) Spectral grouping using the Nystr\(\ddot{o}\)m method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225

    Article  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, IEEE Computer Society, pp 770–778

  • Huang P, Huang Y, Wang W, Wang L (2014) Deep embedding network for clustering. In: Proceedings of the 22nd international conference on pattern recognition, Stockholm, Sweden, IEEE Computer Society, pp 1532–1537

  • Jia H, Ding S, Du M, Xue Y (2016) Approximate normalized cuts without Eigen-decomposition. Inf Sci 374:135–150

    Article  Google Scholar 

  • Kingma DP, Salimans T, Jozefowicz R, Chen X, Sutskever I, Welling M (2016) Improving variational autoencoders with inverse autoregressive flow. In: Proceedings of the annual conference on advances in neural information processing systems, Barcelona, Spain, pp 4736–4744

  • Lecun Y, Bengio Y, Hinton GE (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  • Li M, Lian XC, Kwok JT, Lu B L (2011) Time and space efficient spectral clustering via column sampling. In: Proceedings of the 24th IEEE conference on computer vision and pattern recognition, Colorado Springs, CO, USA, pp 2297–2304

  • Li M, Zhang T, Chen Y, Smola AJ (2014) Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, New York, NY, USA, ACM, pp 661–670

  • Li M, Bi W, Kwok JT, Lu B (2015) Large-scale Nystr\(\ddot{o}\)m kernel matrix approximation using randomized SVD. IEEE Trans Neural Netw Learning Syst 26(1):152–164

    Article  MathSciNet  Google Scholar 

  • Liu J, Wang C, Danilevsky M, Han J (2013) Large-scale spectral clustering on graphs. In: Proceedings of the 23rd international joint conference on artificial intelligence, Beijing, China, pp 1486–1492

  • Liu H, Shao M, Li S, Fu Y (2016) Infinite ensemble for image clustering. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, ACM, pp 1745–1754

  • Liu H, Shao M, Li S, Fu Y (2018) Infinite ensemble clustering. Data Min Knowl Discov 32(2):385–416

    Article  MathSciNet  Google Scholar 

  • Oglic D, Gartner T (2017) Nystr\(\ddot{o}\)m method with kernel k-means++ samples as landmarks. In: Proceedings of the 34th international conference on machine learning, Sydney, NSW, Australia, PMLR, pp 2652–2660

  • Peng X, Xiao S, Feng J, Yau W, Yi Z (2016) Deep subspace clustering with sparsity prior. In: Proceedings of the 25th international joint conference on artificial intelligence, New York, NY, USA, IJCAI/AAAI Press, pp 1925–1931

  • Rafailidis D, Constantinou E, Manolopoulos Y (2014) Scalable spectral clustering with weighted pagerank. In: Proceedings of the 4th international conference on model and data engineering, Larnaca, Cyprus, Springer, pp 289–300

  • Rafailidis D, Constantinou E, Manolopoulos Y (2017) Landmark selection for spectral clustering based on weighted pagerank. Future Gener Comput Syst 68:465–472

    Article  Google Scholar 

  • Shao M, Li S, Ding Z, Fu Y (2015) Deep linear coding for fast graph clustering. In: Proceedings of the 24th international joint conference on artificial intelligence, Buenos Aires, Argentina, AAAI Press, pp 3798–3804

  • Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  • Song C, Liu F, Huang Y, Wang L, Tan T (2013) Auto-encoder based data clustering. In: Proceedings of the 18th Iberoamerican congress in pattern recognition, image analysis, computer vision, and applications, Havana, Cuba, Springer, pp 117–124

  • Sun S, Zhao J, Zhu J (2015) A review of Nystr\(\ddot{o}\)m methods for large-scale ma- chine learning. Inf Fusion 26:36–48

    Article  Google Scholar 

  • Sun S, Zhang B, Xie L, Zhang Y (2017) An unsupervised deep domain adaptation approach for robust speech recognition. Neurocomputing 257:79–87

    Article  Google Scholar 

  • Tian F, Gao B, Cui Q, Chen E, Liu T (2014) Learning deep representations for graph clustering. In: Proceedings of the 28th AAAI conference on artificial intelligence, Quebec City, Quebec, Canada, AAAI Press, pp 1293–1299

  • Vincent P, Larochelle H, Bengio Y, Manzagol P (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning, Helsinki, Finland, ACM, pp 1096– 1103

  • Vincent P, Larochelle H, Lajoie I, Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11(12):3371–3408

    MathSciNet  MATH  Google Scholar 

  • Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  • Xie J, Girshick RB, Farhadi A (2016) Unsupervised deep embedding for cluster- ing analysis. In: Proceedings of the 33nd international conference on machine learning, New York City, NY, USA, JMLR.org, pp 478–487

  • Zhang K, Tsang IW, Kwok JT (2008) Improved Nystr\(\ddot{o}\)m low-rank approximation and error analysis. In: Proceedings of the 25th international conference on machine learning, Helsinki, Finland, ACM, pp 1232–1239

  • Zhang X, Zong L, You Q, Yong X (2016) Sampling for Nystr\(\ddot{o}\)m extension- based spectral clustering: Incremental perspective and novel analysis. TKDD 11(1):7:1–7:25

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiping Zhou.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Zhao, X., Chu, D. et al. An autoencoder-based spectral clustering algorithm. Soft Comput 24, 1661–1671 (2020). https://doi.org/10.1007/s00500-019-03994-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-03994-5

Keywords

Navigation