Abstract
Clustering multiple data streams has become an active area of research with many practical applications. Most of the early work in this area focused on one-sided clustering, i.e., clustering data streams based on feature correlation. However, recent research has shown that data streams can be grouped based on the distribution of their features, while features can be grouped based on their distribution across data streams. In this paper, an evolutionary clustering algorithm is proposed for multiple data streams using graph regularization non-negative matrix factorization (EC-NMF) in which the geometric structure of both the data and feature manifold is considered. Instead of directly clustering multiple data streams periodically, EC-NMF works in the low-rank approximation subspace and incorporates prior knowledge from historic results with temporal smoothness. Furthermore, we develop an iterative algorithm and provide convergence and correctness proofs from a theoretical standpoint. The effectiveness and efficiency of the algorithm are both demonstrated in experiments on real and synthetic data sets. The results show that the proposed EC-NMF algorithm outperforms existing methods for clustering multiple data streams evolving over time.
Similar content being viewed by others
References
Aggarwal CC, Yu P (2005) Online analysis of community evolution in data streams. In: Proceedings of the SIAM international conference on data mining (SDM 2005)
Lin Y-R, et al (2009) Analyzing communities and their evolutions in dynamic social networks. ACM Trans Knowl Disc Data (TKDD) 3(2):8
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Papadopoulos S, et al (2012) Community detection in social media. Data Min Knowl Disc 24(3):515–554
Beringer J, Hullermeier E (2006) Online clustering of parallel data streams. Data Knowl Eng 58(2):180–204
Al Aghbari Z, Kamel I, Awad T (2012) On clustering large number of data streams. Intell Data Anal 16(1):69–91
Masoud H, Jalili S, Hasheminejad SMH (2013) Dynamic clustering using combinatorial particle swarm optimization. Appl Intell 38(3):289–314
Dai BR, et al (2006) Adaptive clustering for multiple evolving streams. IEEE Trans Knowl Data Eng 18(9):1166–1180
Yeh MY, Dai BR, Chen MS (2007) Clustering over multiple evolving streams by events and correlations. IEEE Trans Knowl Data Eng 19(10):1349–1362
Ning H et al (2010) Incremental spectral clustering by efficiently updating the eigen-system. Pattern Recog 43(1):113–127
Wang LJ et al (2012) Low-Rank Kernel matrix factorization for large-scale evolutionary clustering. IEEE Trans Knowl Data Eng 24(6):1036–1050
Mandayam Comar P, Tan P-N, Jain AK (2012) A framework for joint community detection across multiple related networks. Neurocomputing 76(1):93–104
Sun J, Xie Y, Zhang H, Faloutsos C (2007) Less is more: compact matrix decomposition for large sparse graphs. In: Proceedings of the 2007 SIAM international conference on data mining (SDM 2007)
Sarkar P, Moore AW (2005) Dynamic social network analysis using latent space models. ACM SIGKDD Explor Newsl 7(2):31–40
Ding C et al (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining. ACM
Wu M-L, Chang C-H, Liu R-Z (2013) Co-clustering with augmented matrix. Appl Intell 39(1):153–164
Cai D et al (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560
Gu Q, Zhou J (2009) Co-clustering on manifolds. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM
Ding CH, Li T, Jordan MI (2010) Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 32(1):45–55
Drineas P, Kannan R, Mahoney MW (2006) Fast Monte Carlo algorithms for matrices III: computing a compressed approximate matrix decomposition. SIAM J Comput 36(1):184–206
Tong H et al (2008) Colibri: fast mining of large static and dynamic graphs. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM
Shang F, Jiao L, Wang F (2012) Graph dual regularization non-negative matrix factorization for co-clustering. Pattern Recog 45(6):2237–2250
Seung D, Lee L (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13:556–562
Chung FR (1997) Spectral graph theory, vol 92. AMS Bookstore
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in neural information processing systems, vol 14. MIT Press, pp 585–591
Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning, vol 2. MIT Press, Cambridge
Cvetkovic D, Rowlinson P (2004) Spectral graph theory. In: Topics in algebraic graph theory. Cambridge University Press, pp 88–112
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press
Shi JB, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Acknowledgments
The authors gratefully acknowledge the supports provided for this research by the Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20120191110047), Natural Science Foundation Project of CQ CSTC of China (Grant No. CSTC2012JJB40002), Engineering Center Research Program of Chongqing of China (Grant No. 2011pt-gc30005), and Fundamental Research Funds for the Central Universities of China (Grant No. CDJXS10170004).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sang, CY., Sun, DH. Co-clustering over multiple dynamic data streams based on non-negative matrix factorization. Appl Intell 41, 487–502 (2014). https://doi.org/10.1007/s10489-014-0526-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-014-0526-0