Skip to main content
Log in

Co-clustering over multiple dynamic data streams based on non-negative matrix factorization

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Clustering multiple data streams has become an active area of research with many practical applications. Most of the early work in this area focused on one-sided clustering, i.e., clustering data streams based on feature correlation. However, recent research has shown that data streams can be grouped based on the distribution of their features, while features can be grouped based on their distribution across data streams. In this paper, an evolutionary clustering algorithm is proposed for multiple data streams using graph regularization non-negative matrix factorization (EC-NMF) in which the geometric structure of both the data and feature manifold is considered. Instead of directly clustering multiple data streams periodically, EC-NMF works in the low-rank approximation subspace and incorporates prior knowledge from historic results with temporal smoothness. Furthermore, we develop an iterative algorithm and provide convergence and correctness proofs from a theoretical standpoint. The effectiveness and efficiency of the algorithm are both demonstrated in experiments on real and synthetic data sets. The results show that the proposed EC-NMF algorithm outperforms existing methods for clustering multiple data streams evolving over time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://www.engr.udayton.edu/weather/

  2. http://pems.dot.ca.gov/.

  3. http://www.digg.com/

References

  1. Aggarwal CC, Yu P (2005) Online analysis of community evolution in data streams. In: Proceedings of the SIAM international conference on data mining (SDM 2005)

  2. Lin Y-R, et al (2009) Analyzing communities and their evolutions in dynamic social networks. ACM Trans Knowl Disc Data (TKDD) 3(2):8

    Google Scholar 

  3. Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113

    Article  Google Scholar 

  4. Papadopoulos S, et al (2012) Community detection in social media. Data Min Knowl Disc 24(3):515–554

    Article  Google Scholar 

  5. Beringer J, Hullermeier E (2006) Online clustering of parallel data streams. Data Knowl Eng 58(2):180–204

    Article  Google Scholar 

  6. Al Aghbari Z, Kamel I, Awad T (2012) On clustering large number of data streams. Intell Data Anal 16(1):69–91

    Google Scholar 

  7. Masoud H, Jalili S, Hasheminejad SMH (2013) Dynamic clustering using combinatorial particle swarm optimization. Appl Intell 38(3):289–314

    Google Scholar 

  8. Dai BR, et al (2006) Adaptive clustering for multiple evolving streams. IEEE Trans Knowl Data Eng 18(9):1166–1180

    Article  Google Scholar 

  9. Yeh MY, Dai BR, Chen MS (2007) Clustering over multiple evolving streams by events and correlations. IEEE Trans Knowl Data Eng 19(10):1349–1362

    Article  Google Scholar 

  10. Ning H et al (2010) Incremental spectral clustering by efficiently updating the eigen-system. Pattern Recog 43(1):113–127

    Article  MATH  Google Scholar 

  11. Wang LJ et al (2012) Low-Rank Kernel matrix factorization for large-scale evolutionary clustering. IEEE Trans Knowl Data Eng 24(6):1036–1050

    Article  Google Scholar 

  12. Mandayam Comar P, Tan P-N, Jain AK (2012) A framework for joint community detection across multiple related networks. Neurocomputing 76(1):93–104

    Article  Google Scholar 

  13. Sun J, Xie Y, Zhang H, Faloutsos C (2007) Less is more: compact matrix decomposition for large sparse graphs. In: Proceedings of the 2007 SIAM international conference on data mining (SDM 2007)

  14. Sarkar P, Moore AW (2005) Dynamic social network analysis using latent space models. ACM SIGKDD Explor Newsl 7(2):31–40

    Article  Google Scholar 

  15. Ding C et al (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM

  16. Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining. ACM

  17. Wu M-L, Chang C-H, Liu R-Z (2013) Co-clustering with augmented matrix. Appl Intell 39(1):153–164

    Google Scholar 

  18. Cai D et al (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560

    Article  Google Scholar 

  19. Gu Q, Zhou J (2009) Co-clustering on manifolds. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM

  20. Ding CH, Li T, Jordan MI (2010) Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 32(1):45–55

    Article  Google Scholar 

  21. Drineas P, Kannan R, Mahoney MW (2006) Fast Monte Carlo algorithms for matrices III: computing a compressed approximate matrix decomposition. SIAM J Comput 36(1):184–206

    Article  MATH  MathSciNet  Google Scholar 

  22. Tong H et al (2008) Colibri: fast mining of large static and dynamic graphs. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM

  23. Shang F, Jiao L, Wang F (2012) Graph dual regularization non-negative matrix factorization for co-clustering. Pattern Recog 45(6):2237–2250

    Article  MATH  Google Scholar 

  24. Seung D, Lee L (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13:556–562

    Google Scholar 

  25. Chung FR (1997) Spectral graph theory, vol 92. AMS Bookstore

  26. Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in neural information processing systems, vol 14. MIT Press, pp 585–591

  27. Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning, vol 2. MIT Press, Cambridge

  28. Cvetkovic D, Rowlinson P (2004) Spectral graph theory. In: Topics in algebraic graph theory. Cambridge University Press, pp 88–112

  29. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

    Article  Google Scholar 

  30. Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press

  31. Shi JB, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  32. Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors gratefully acknowledge the supports provided for this research by the Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20120191110047), Natural Science Foundation Project of CQ CSTC of China (Grant No. CSTC2012JJB40002), Engineering Center Research Program of Chongqing of China (Grant No. 2011pt-gc30005), and Fundamental Research Funds for the Central Universities of China (Grant No. CDJXS10170004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun-Yan Sang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sang, CY., Sun, DH. Co-clustering over multiple dynamic data streams based on non-negative matrix factorization. Appl Intell 41, 487–502 (2014). https://doi.org/10.1007/s10489-014-0526-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-014-0526-0

Keywords

Navigation