Clustering based on matrix approximation: a unifying view

Li, Tao

doi:10.1007/s10115-007-0116-0

Clustering based on matrix approximation: a unifying view

Regular Paper
Published: 08 January 2008

Volume 17, pages 1–15, (2008)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Tao Li¹

122 Accesses
17 Citations
Explore all metrics

Abstract

Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. Recently, a number of methods have been proposed and demonstrated good performance based on matrix approximation. Despite significant research on these methods, few attempts have been made to establish the connections between them while highlighting their differences. In this paper, we present a unified view of these methods within a general clustering framework where the problem of clustering is formulated as matrix approximations and the clustering objective is minimizing the approximation error between the original data matrix and the reconstructed matrix based on the cluster structures. The general framework provides an elegant base to compare and understand various clustering methods. We provide characterizations of different clustering methods within the general framework including traditional one-side clustering, subspace clustering and two-side clustering. We also establish the connections between our general clustering framework with existing frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD international conference on Management of data (SIGMOD’99). ACM Press, pp 61–72
Ando RK, Lee L (2001) Iterative residual rescaling: an analysis and generalization of LSI. In: Proceedings of the 24th SIGIR, pp 154–162
Baier D, Gaul W, Schader M (1997) Two-mode overlapping clustering with applications to simultaneous benefit segmentation and market structuring. In: Klar R, Opitz O (eds) Classification and knowledge organization. Springer, Heidelberg, pp 577–566
Google Scholar
Castillo W, Trejos J (2002) Two-mode partitioning: Review of methods and application and tabu search. In: Jajuga K, Sokolowski A, Bock H-H (eds) Classification, clustering and data analysis. Springer, Heidelberg, pp 43–51
Google Scholar
Cho H, Dhillon IS, Guan Y, Sra S (2004) Minimum sum-squared residue co-clustering of gene experssion data. In: Proceedings of the SIAM data mining conference
Dhillon IS, Mallela S, Modha SS (2003) Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD 2003). ACM Press, pp 89–98
Dhillon IS and Modha DS (2001). Concept decompositions for large sparse text data using clustering. Mach Learn 42(1/2): 143–175
Article MATH Google Scholar
Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix tri-factorizations for clustering. In: Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining, pp 126–135
Eckes T and Orlik P (1993). An error variance approach to two-mode hierarchical clustering. J Classif 10: 52–74
Article Google Scholar
Gaul W, Schader M (1996) A new algorithm for two-mode clustering. In: Bock H-H, Polasek W (eds). Data analysis and information systems. Springer, Heidelberg, pp 15–23
Google Scholar
Golub GH, Van Loan CF (1996) Matrix computations. The Johns Hopkins University Press
Govaert G (1995). Simultaneous clustering of rows and columns. Control Cybernet 24(4): 437–458
MATH Google Scholar
Hartigan JA (1975). Clustering algorithms. Wiley, New York
MATH Google Scholar
Jain AK and Dubes RC (1988). Algorithms for clustering data. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Kleinberg JM (1999). Authoritative sources in a hyperlinked environment. J ACM 46(5): 604–632
Article MATH MathSciNet Google Scholar
Lee DD, Sebastian Seung H (2000) Algorithms for non-negative matrix factorization. In: NIPS, pp 556–562
Li T (2005) A general model for clustering binary data. In: KDD ’05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp 188–197
Li T, Ma S (2004) IFD: iterative feature and data clustering. In: Proceedings of the 2004 SIAM international conference on data mining (SDM 2004). SIAM
Li T, Ma S, Ogihara M (2004) Document clustering via adaptive subspace iteration. In: Proceedings of twenty-seventh annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2004), pp 218–225
Long B, Zhang Z, Yu PS (2005) Co-clustering by block value decomposition. In: KDD ’05: Proceeding of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, pp 635–640
Maurizio V (2001) Double k-means clustering for simultaneous classification of objects and variables. In: Borra S, Rocci R, Vichi M, Schader M (eds) Advances in classification and data analysis. Springer, Heidelberg, pp 43–52
Google Scholar
Sha F, Saul LK, Lee DD (2002) Multiplicative updates for nonegative quadratic programming in support vector machines. In: Advances in neural information processing systems, pp 1065–1072
Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’00). ACM Press, pp 208–215
Soete GD, douglas Carroll J (1994) K-means clustering in a low-dimensional euclidean space. In: New approaches in classification and data analysis. Springer, Heidelberg, pp 212–219
Xu W, Gong Y (2004) Document clustering by concept factorization. In: SIGIR ’04: Proceedings of the 27th annual international conference on Research and development in information retrieval. ACM Press, pp 202–209
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval(SIGIR’03). ACM Press, pp 267–273
Zha H, He X, Ding C, Simon H (2001) Spectral relaxation for k-means clustering. In: Proceedings of neural information processing systems

Download references

Author information

Authors and Affiliations

School of Computer Science, Florida International University, Miami, FL, 33199, USA
Tao Li

Authors

Tao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, T. Clustering based on matrix approximation: a unifying view. Knowl Inf Syst 17, 1–15 (2008). https://doi.org/10.1007/s10115-007-0116-0

Download citation

Received: 13 December 2005
Revised: 19 March 2007
Accepted: 08 October 2007
Published: 08 January 2008
Issue Date: October 2008
DOI: https://doi.org/10.1007/s10115-007-0116-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering based on matrix approximation: a unifying view

Abstract

Access this article

Similar content being viewed by others

Sparse and Low-Rank Methods

Braverman’s Spectrum and Matrix Diagonalization Versus iK-Means: A Unified Framework for Clustering

Fast Spectral Clustering via the Nyström Method

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering based on matrix approximation: a unifying view

Abstract

Access this article

Similar content being viewed by others

Sparse and Low-Rank Methods

Braverman’s Spectrum and Matrix Diagonalization Versus iK-Means: A Unified Framework for Clustering

Fast Spectral Clustering via the Nyström Method

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation