Abstract
Today, digital data is accumulated at a faster than ever speed in science, engineering, biomedicine, and real-world sensing. The ubiquitous phenomenon of massive data and sparse information imposes considerable challenges in data mining research. In this paper, we propose a theoretical framework, Exemplar-based low-rank sparse matrix decomposition (EMD), to cluster large-scale datasets. Capitalizing on recent advances in matrix approximation and decomposition, EMD can partition datasets with large dimensions and scalable sizes efficiently. Specifically, given a data matrix, EMD first computes a representative data subspace and a near-optimal low-rank approximation. Then, the cluster centroids and indicators are obtained through matrix decomposition, in which we require that the cluster centroids lie within the representative data subspace. By selecting the representative exemplars, we obtain a compact “sketch”of the data. This makes the clustering highly efficient and robust to noise. In addition, the clustering results are sparse and easy for interpretation. From a theoretical perspective, we prove the correctness and convergence of the EMD algorithm, and provide detailed analysis on its efficiency, including running time and spatial requirements. Through extensive experiments performed on both synthetic and real datasets, we demonstrate the performance of EMD for clustering large-scale data.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Achlioptas D, Mcsherry F (2007) Fast computation of low-rank matrix approximations. J ACM 54(2):9
Baker CTH (1997) The numerical treatment of integral equations. Clarendon Press, Oxford
Barron AR, Rissanen J, Yu B (1998) The minimum description length principle in coding and modeling. IEEE Trans Inf Theory 44(6):2743–2760
Berry MW, Browne M, Langville AN, Pauca PV, Plemmons RJ (September 2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52(1):155–173
Berry MW, Pulatova SA, Stewart GW (2005) Algorithm 844: computing sparse reduced-rank approximations to sparse matrices. ACM Trans Math Softw 31(2):252–269
Chen Y, Wang L, Dong M, Hua J (2009) Exemplar-based visualization of large document corpus. IEEE Trans Visual Comput Graphics 15(6):1169–1176
Chung FRK (1997) Spectral graph theory. American Mathematical Society
Delves LM, Mohamed JL (1985) Computational methods for integral equations. Cambridge University Press, New York
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc B 39:1–38
Dhillon I, Guan Y, Kulis B (2004) Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 551–556
Dhillon IS, Guan Y, Kulis B (2005) A unified view of kernel k-means, spectral clustering and graph cuts. Technical Report TR-04-25, University of Texas Dept. of Computer Science
Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of SIAM International Conference of Data Mining, pp 606–610
Ding C, He X, Zha H, Simon HD (2001) A min-max cut algorithm for graph partitioning and data clustering. In: IEEE International Conference on Data Mining, pp 107–114
Ding C, Li T, Jordan MI (2008) Convex and semi-nonnegative matrix factorizations. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 99. IEEE Computer Society, Los Alamitos
Ding C, Li T, Peng W (2006) Nonnegative matrix factorization and probabilistic latent semantic indexing: equivalence chi-square statistic, and a hybrid method. Proc Natl Conf Artif Intell 21(1):342
Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 126–135
Drineas P, Frieze A, Kannan R, Vempala S, Vinay V (2004) Clustering large graphs via the singular value decomposition. IEEE J Mach Learn 56(1–3):9–33
Drineas P, Kannan R, Mahoney M (2006) Fast monte-carlo algorithms for matrices ii: computing low-rank approximations to a matrix. SIAM J Comput 36:158–183
Drineas P, Kannan R, Mahoney MW (2006) Fast monte carlo algorithms for matrices iii: computing a compressed approximate matrix decomposition. SIAM J Comput 36:184–206
Drineas P, Mahoney MW (2005) On the nyström method for approximating a gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175
Duda HO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
Fiedler M (1973) Algebraic connectivity of graphs. Czechoslov Math J 23(98):298–305
Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the nyström method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman, New York
Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore
Hagen L, Kahng AB (1992) New spectral methods for ratio cut partitioning and clustering. IEEE Trans Comput Aided Des Integr Circuits Syst 11(9):1074–1085
Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
Kim H, Park H (2007) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12):1495–1502
Lang K (1995) News weeder: learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, pp 331–339
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. Neural Inf Proc Syst 13:556–562
Li T, Ding C (2006) The relationships among various nonnegative matrix factorization methods for clustering. In: Proceedings of the IEEE International Conference on Data Mining, pp 362–371
MacQueen JB (1967) Some methods for classsification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp 281–297
Mahdavi M, Abolhassani H (2009) Harmony k-means algorithm for document clustering. Data Min Knowl Disc 18:370–391. doi:10.1007/s10618-008-0123-0
Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Sha F, Lin Y, Saul LK, Lee DD (2007) Multiplicative updates for nonnegative quadratic programming. Neural Comput 19(8):2004–2031
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Shyamalkumar ND, Varadarajan K (2007) Efficient subspace approximation algorithms. In: SODA ’07 Proceedings of the 18th annual ACM-SIAM symposium on Discrete algorithms, pp 532–540
Stewart GW (1999) Four algorithms for the efficient computation of truncated qr approximations to a sparse matrix. Numer Math 83:313–323
Strehl A, Ghosh J, Cardie C (2002) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Sun J, Xie Y, Zhang H, Faloutsos C (2008) Less is more: sparse graph mining with compact matrix decomposition. Stat Anal Data Min 1(1):6–22
Tong H, Papadimitriou S, Sun J, Yu PS, Faloutsos C (2008) Colibri: fast mining of large static and dynamic graphs. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 686–694
Wang F, Li T, Wang X, Zhu S, Ding C (2011) Community discovery using nonnegative matrix factorization. Data Min Knowl Disc 22:493–521. doi:10.1007/s10618-010-0181-y
Wang L, Dong M (2011) On the clustering of large-scale data: a matrix-based approach. In: To appear Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2011), p 10
Williams CK, Seeger M (2001) Using the nyström method to speed up kernel machines. In: Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference, MIT Press, pp 682–688
Xu W, Gong Y (2004) Document clustering by concept factorization. In: SIGIR ’04: proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, New York, pp 202–209
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp 267–273
Yan D, Huang L, Jordan M (2009) Fast approximate spectral clustering. Technical Report UCB/EECS-2009-45, EECS Department, University of California, Berkeley
Yen GG, Wu Z (2008) Ranked centroid projection: a data visualization approach with self-organizing maps. IEEE Trans Neural Netw 19(2):245–259
Zhang K, Kwok JT (2006) Block-quantized kernel matrix for fast spectral embedding. In: ICML ’06: proceedings of the 23rd international conference on Machine learning, ACM, New York, pp 1097–1104
Zhang K, Tsang IW, Kwok JT (2008) Improved nyström low-rank approximation and error analysis. In ICML ’08: proceedings of the 25th international conference on Machine learning, ACM, New York, pp 1232–1239
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Sugato Basu.
Rights and permissions
About this article
Cite this article
Wang, L., Dong, M. Exemplar-based low-rank matrix decomposition for data clustering. Data Min Knowl Disc 29, 324–357 (2015). https://doi.org/10.1007/s10618-014-0347-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-014-0347-0