Skip to main content
Log in

Clustering based on matrix approximation: a unifying view

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. Recently, a number of methods have been proposed and demonstrated good performance based on matrix approximation. Despite significant research on these methods, few attempts have been made to establish the connections between them while highlighting their differences. In this paper, we present a unified view of these methods within a general clustering framework where the problem of clustering is formulated as matrix approximations and the clustering objective is minimizing the approximation error between the original data matrix and the reconstructed matrix based on the cluster structures. The general framework provides an elegant base to compare and understand various clustering methods. We provide characterizations of different clustering methods within the general framework including traditional one-side clustering, subspace clustering and two-side clustering. We also establish the connections between our general clustering framework with existing frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD international conference on Management of data (SIGMOD’99). ACM Press, pp 61–72

  • Ando RK, Lee L (2001) Iterative residual rescaling: an analysis and generalization of LSI. In: Proceedings of the 24th SIGIR, pp 154–162

  • Baier D, Gaul W, Schader M (1997) Two-mode overlapping clustering with applications to simultaneous benefit segmentation and market structuring. In: Klar R, Opitz O (eds) Classification and knowledge organization. Springer, Heidelberg, pp 577–566

    Google Scholar 

  • Castillo W, Trejos J (2002) Two-mode partitioning: Review of methods and application and tabu search. In: Jajuga K, Sokolowski A, Bock H-H (eds) Classification, clustering and data analysis. Springer, Heidelberg, pp 43–51

    Google Scholar 

  • Cho H, Dhillon IS, Guan Y, Sra S (2004) Minimum sum-squared residue co-clustering of gene experssion data. In: Proceedings of the SIAM data mining conference

  • Dhillon IS, Mallela S, Modha SS (2003) Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD 2003). ACM Press, pp 89–98

  • Dhillon IS and Modha DS (2001). Concept decompositions for large sparse text data using clustering. Mach Learn 42(1/2): 143–175

    Article  MATH  Google Scholar 

  • Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix tri-factorizations for clustering. In: Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining, pp 126–135

  • Eckes T and Orlik P (1993). An error variance approach to two-mode hierarchical clustering. J Classif 10: 52–74

    Article  Google Scholar 

  • Gaul W, Schader M (1996) A new algorithm for two-mode clustering. In: Bock H-H, Polasek W (eds). Data analysis and information systems. Springer, Heidelberg, pp 15–23

    Google Scholar 

  • Golub GH, Van Loan CF (1996) Matrix computations. The Johns Hopkins University Press

  • Govaert G (1995). Simultaneous clustering of rows and columns. Control Cybernet 24(4): 437–458

    MATH  Google Scholar 

  • Hartigan JA (1975). Clustering algorithms. Wiley, New York

    MATH  Google Scholar 

  • Jain AK and Dubes RC (1988). Algorithms for clustering data. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  • Kleinberg JM (1999). Authoritative sources in a hyperlinked environment. J ACM 46(5): 604–632

    Article  MATH  MathSciNet  Google Scholar 

  • Lee DD, Sebastian Seung H (2000) Algorithms for non-negative matrix factorization. In: NIPS, pp 556–562

  • Li T (2005) A general model for clustering binary data. In: KDD ’05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp 188–197

  • Li T, Ma S (2004) IFD: iterative feature and data clustering. In: Proceedings of the 2004 SIAM international conference on data mining (SDM 2004). SIAM

  • Li T, Ma S, Ogihara M (2004) Document clustering via adaptive subspace iteration. In: Proceedings of twenty-seventh annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2004), pp 218–225

  • Long B, Zhang Z, Yu PS (2005) Co-clustering by block value decomposition. In: KDD ’05: Proceeding of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, pp 635–640

  • Maurizio V (2001) Double k-means clustering for simultaneous classification of objects and variables. In: Borra S, Rocci R, Vichi M, Schader M (eds) Advances in classification and data analysis. Springer, Heidelberg, pp 43–52

    Google Scholar 

  • Sha F, Saul LK, Lee DD (2002) Multiplicative updates for nonegative quadratic programming in support vector machines. In: Advances in neural information processing systems, pp 1065–1072

  • Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’00). ACM Press, pp 208–215

  • Soete GD, douglas Carroll J (1994) K-means clustering in a low-dimensional euclidean space. In: New approaches in classification and data analysis. Springer, Heidelberg, pp 212–219

  • Xu W, Gong Y (2004) Document clustering by concept factorization. In: SIGIR ’04: Proceedings of the 27th annual international conference on Research and development in information retrieval. ACM Press, pp 202–209

  • Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval(SIGIR’03). ACM Press, pp 267–273

  • Zha H, He X, Ding C, Simon H (2001) Spectral relaxation for k-means clustering. In: Proceedings of neural information processing systems

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, T. Clustering based on matrix approximation: a unifying view. Knowl Inf Syst 17, 1–15 (2008). https://doi.org/10.1007/s10115-007-0116-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-007-0116-0

Keywords

Navigation