Abstract
In many large-scale real-time monitoring applications, hundreds or thousands of streams should be continuously monitored. To ease the monitoring, a small set of representatives can be extracted to represent all the streams. To get a high-quality representative set, not only representativeness but also its stability should be guaranteed. In this paper, we propose a method to continuously extract high-quality representative set from massive streams. First, we cluster streams based on core clustering model. The tightness of core set, which means any two streams in core set are highly correlated, ensures high representativeness of representative set; second, we use topological relationship to force each cluster to be connected in the network where streams are generated from. Because streams in one cluster are driven by similar underlying mechanisms, so the representative set becomes much more stable. By utilizing the tightness of core sets, we can get representative set immediately. Moreover, with local optimization strategies, our method can adjust core clusters very efficiently, which enables real-time response. Experiments on real applications illustrate that our method is efficient and produces high-quality representative set.
This work is supported by the National Natural Science Foundation of China under Grant No.61103025.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ackermann, M.R., Märtens, M., Raupach, C., et al.: StreamKM++: A Clustering Algorithm for Data Streams. Journal of Experimental Algorithmics (JEA) 17(1), 2–4 (2012)
Aggarwal, C.C., Han, J., Wang, J., et al.: A Framework for Clustering Evolving Data Streams. In: VLDB (2003)
Cheng, J., Ke, Y., Chu, S., et al.: Efficient Core Decomposition in Massive Networks. In: ICDE (2011)
Cheng, J., Zhu, L., Ke, Y., et al.: Fast Algorithms for Maximal Clique Enumeration with Limited Memory. In: SIGKDD (2012)
Jiang, L., Yang, D., Tang, S., Ma, X., Zhang, D.: A Core Clustering Approach for Cube Slice. Journal of Computer Research and Development, 359–365 (2006)
Li, L., McCann, J., Pollard, N., Faloutsos, C.: DynaMMO: Mining and Summarization of Coevolving Sequences with missing values. In: SIGKDD (2009)
Li, Q., Ma, X., Tang, S., Xie, S.: Continuously Identifying Representatives out of Massive Streams. In: Tang, J., King, I., Chen, L., Wang, J. (eds.) ADMA 2011, Part I. LNCS, vol. 7120, pp. 229–242. Springer, Heidelberg (2011)
Liu, W., Zheng, Y., Chawla, S.: Discovering Spatio-temporal Causal Interactions in Traffic Data Streams. In: SIGKDD (2011)
Ostfeld, A., Uber, J.G., Salomons, E.: Battle of water sensor networks: A Design Challenge for Engineers and Algorithms. In: WDSA (2006)
Papadimitriou, S., Sun, J., Faloutsos, C.: Streaming Pattern Discovery in Multiple Timeseries. In: VLDB (2005)
Park, H.S., Jun, C.H.: A Simple and Fast Algorithm for K-medoids Clustering. Expert Systems with Applications 36(2), 3336–3341 (2009)
Rossman, L.A.: EPANET2 user’s manual. National Risk Management Research Laboratory: U.S. Environmental Protection Agency (2000)
Xiao, H., Ma, X., Tang, S.: Continuous Summarization of Co-evolving Data in Large Water Distribution Network. In: WAIM (2010)
Yeh, M., Dai, B., Chen, M.: Clustering over Multiple Evolving Streams by Events and Correlations. TKDE 19(10), 1349–1362 (2007)
The Centre for Water Systems (CWS) at the University of Exeter, http://centres.exeter.ac.uk/cws/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ji, X., Ma, X., Huang, T., Tang, S. (2013). Continuously Extracting High-Quality Representative Set from Massive Data Streams. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53914-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-53914-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53913-8
Online ISBN: 978-3-642-53914-5
eBook Packages: Computer ScienceComputer Science (R0)