skip to main content
10.1145/3219819.3220039acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Spectral Clustering of Large-scale Data by Directly Solving Normalized Cut

Published: 19 July 2018 Publication History

Abstract

During the past decades, many spectral clustering algorithms have been proposed. However, their high computational complexities hinder their applications on large-scale data. Moreover, most of them use a two-step approach to obtain the optimal solution, which may deviate from the solution by directly solving the original problem. In this paper, we propose a new optimization algorithm, namely Direct Normalized Cut (DNC), to directly optimize the normalized cut model. DNC has a quadratic time complexity, which is a significant reduction comparing with the cubic time complexity of the traditional spectral clustering. To cope with large-scale data, a Fast Normalized Cut (FNC) method with linear time and space complexities is proposed by extending DNC with an anchor-based strategy. In the new method, we first seek a set of anchors and then construct a representative similarity matrix by computing distances between the anchors and the whole data set. To find high quality anchors that best represent the whole data set, we propose a Balanced k-means (BKM) to partition a data set into balanced clusters and use the cluster centers as anchors. Then DNC is used to obtain the final clustering result from the representative similarity matrix. A series of experiments were conducted on both synthetic data and real-world data sets, and the experimental results show the superior performance of BKM, DNC and FNC.

Supplementary Material

MP4 File (chen_clustering_normalized.mp4)

References

[1]
Deng Cai and Xinlei Chen . 2015. Large Scale Spectral Clustering Via Landmark-Based Sparse Representation. IEEE Transactions on Cybernetics Vol. 45, 8 (2015), 1669--1680.
[2]
Xiao Cai, Feiping Nie, Heng Huang, and Farhad Kamangar . 2011. Heterogeneous image feature integration via multi-modal spectral clustering IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1977--1984.
[3]
Xiaojun Chen, Joshua Zhexue Huang, Feiping Nie, Renjie Chen, and Qingyao Wu . 2017 a. A Self-Balanced Min-Cut Algorithm for Image Clustering Proceedings of the International Conference on Computer Vision, ICCV-17. 2080--2088.
[4]
Xiaojun Chen, Feiping Nie, Joshua Zhexue Huang, and Min Yang . 2017 b. Scalable Normalized Cut with Improved Spectral Rotation Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. 1518--1524.
[5]
Xiaojun Chen, Xiaofei Xu, Yunming Ye, and Joshua Zhexue Huang . 2013. TW-k-means: Automated Two-level Variable Weighting Clustering Algorithm for Multi-view Data. IEEE Transactions on Knowledge and Data Engineering Vol. 25, 4 (2013), 932--944.
[6]
Xiaojun Chen, Min Yang, Joshua Zhexue Huang, and Zhong Ming . 2018. TWCC: Automated Two-way Subspace Weighting Partitional Co-Clustering. Pattern Recognition Vol. 76 (2018), 404--415.
[7]
Charless Fowlkes, Serge Belongie, Fan Chung, and Jitendra Malik . 2010. Spectral grouping using the Nyström method. IEEE Transactions on Pattern Analysis $&$ Machine Intelligence Vol. 26, 2 (2010), 214--225.
[8]
Lars Hagen and Andrew B. Kahng . 1992. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Vol. 11, 9 (1992), 1074--1085.
[9]
Jin Huang, Feiping Nie, and Heng Hu . 2013. Spectral rotation versus K-Means in spectral clustering AAAI Conference on Artificial Intelligence. 431--437.
[10]
Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek . 2009. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data Vol. 3, 1 (2009), 1--58.
[11]
Mu Li, James T. Kwok, and Bao Liang Lu . 2010. Making Large-Scale Nyström Approximation Possible International Conference on Machine Learning. 631--638.
[12]
Wei Liu, Junfeng He, and Shih Fu Chang . 2010. Large Graph Construction for Scalable Semi-Supervised Learning International Conference on Machine Learning. 679--686.
[13]
Andrew Y Ng, Michael I Jordan, Yair Weiss, et almbox. . 2002. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems Vol. 2 (2002), 849--856.
[14]
Feiping Nie, Xiaoqian Wang, and Heng Huang . 2014 a. Clustering and projected clustering with adaptive neighbors Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 977--986.
[15]
Feiping Nie, Xiaoqian Wang, Michael Jordan, and Heng Huang . 2016. The Constrained Laplacian Rank Algorithm for Graph-Based Clustering Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. 1969--1976.
[16]
F. Nie, J. Yuan, and H. Huang . 2014 b. Optimal mean robust principal component analysis. In International Conference on Machine Learning. 1062--1070.
[17]
Feiping Nie, Zinan Zeng, Ivor W Tsang, Dong Xu, and Changshui Zhang . 2011. Spectral embedded clustering: a framework for in-sample and out-of-sample spectral clustering. IEEE Transactions on Neural Networks Vol. 22, 11 (2011), 1796--808.
[18]
Hiroyuki Shinnou and Minoru Sasaki . 2008. Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size. In International Conference on Language Resources and Evaluation, Lrec 2008, 26 May - 1 June 2008, Marrakech, Morocco. 201--204.
[19]
Ulrike Von Luxburg . 2007. A tutorial on spectral clustering. Statistics and computing Vol. 17, 4 (2007), 395--416.
[20]
Donghui Yan, Ling Huang, and Michael I Jordan . 2009. Fast approximate spectral clustering. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 907--916.
[21]
Stella X. Yu and Jianbo Shi . 2003. Multiclass Spectral Clustering. In Proceedings of IEEE International Conference on Computer Vision. 313--319 vol.1.

Cited By

View all
  • (2025)Toward Balance Deep Semisupervised ClusteringIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.333968036:2(2816-2828)Online publication date: Feb-2025
  • (2025)Deep Similarity Graph Fusion for Multiview ClusteringIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.347918812:1(435-446)Online publication date: Feb-2025
  • (2024)A Survey of Co-ClusteringACM Transactions on Knowledge Discovery from Data10.1145/368179318:9(1-28)Online publication date: 25-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. large-scale data
  3. normalized cut

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '18
Sponsor:

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)85
  • Downloads (Last 6 weeks)7
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Toward Balance Deep Semisupervised ClusteringIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.333968036:2(2816-2828)Online publication date: Feb-2025
  • (2025)Deep Similarity Graph Fusion for Multiview ClusteringIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.347918812:1(435-446)Online publication date: Feb-2025
  • (2024)A Survey of Co-ClusteringACM Transactions on Knowledge Discovery from Data10.1145/368179318:9(1-28)Online publication date: 25-Jul-2024
  • (2024)Enhanced Tensorial Self-representation Subspace Learning for Incomplete Multi-view ClusteringProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681573(719-728)Online publication date: 28-Oct-2024
  • (2024)Fast Clustering With Anchor GuidanceIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.331860346:4(1898-1912)Online publication date: Apr-2024
  • (2024)Discretize Relaxed Solution of Spectral Clustering via a Nonheuristic AlgorithmIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.330987135:12(17965-17972)Online publication date: Dec-2024
  • (2024)Efficient and Effective One-Step Multiview ClusteringIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.325324635:9(12224-12235)Online publication date: Sep-2024
  • (2024)Fast Clustering via Maximizing Adaptively Within-Class SimilarityIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.323668635:7(9800-9813)Online publication date: Jul-2024
  • (2024)Fast Clustering by Directly Solving Bipartite Graph Clustering ProblemIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.321913135:7(9174-9185)Online publication date: Jul-2024
  • (2024)Soft Multiprototype Clustering Algorithm via Two-Layer Semi-NMFIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.332910832:4(1615-1629)Online publication date: Apr-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media