Continuously Extracting High-Quality Representative Set from Massive Data Streams

Ji, Xiaokang; Ma, Xiuli; Huang, Ting; Tang, Shiwei

doi:10.1007/978-3-642-53914-5_8

Xiaokang Ji^25,26,
Xiuli Ma^25,26,
Ting Huang^25,26 &
…
Shiwei Tang^25,26

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8346))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2356 Accesses

Abstract

In many large-scale real-time monitoring applications, hundreds or thousands of streams should be continuously monitored. To ease the monitoring, a small set of representatives can be extracted to represent all the streams. To get a high-quality representative set, not only representativeness but also its stability should be guaranteed. In this paper, we propose a method to continuously extract high-quality representative set from massive streams. First, we cluster streams based on core clustering model. The tightness of core set, which means any two streams in core set are highly correlated, ensures high representativeness of representative set; second, we use topological relationship to force each cluster to be connected in the network where streams are generated from. Because streams in one cluster are driven by similar underlying mechanisms, so the representative set becomes much more stable. By utilizing the tightness of core sets, we can get representative set immediately. Moreover, with local optimization strategies, our method can adjust core clusters very efficiently, which enables real-time response. Experiments on real applications illustrate that our method is efficient and produces high-quality representative set.

This work is supported by the National Natural Science Foundation of China under Grant No.61103025.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ackermann, M.R., Märtens, M., Raupach, C., et al.: StreamKM++: A Clustering Algorithm for Data Streams. Journal of Experimental Algorithmics (JEA) 17(1), 2–4 (2012)
Google Scholar
Aggarwal, C.C., Han, J., Wang, J., et al.: A Framework for Clustering Evolving Data Streams. In: VLDB (2003)
Google Scholar
Cheng, J., Ke, Y., Chu, S., et al.: Efficient Core Decomposition in Massive Networks. In: ICDE (2011)
Google Scholar
Cheng, J., Zhu, L., Ke, Y., et al.: Fast Algorithms for Maximal Clique Enumeration with Limited Memory. In: SIGKDD (2012)
Google Scholar
Jiang, L., Yang, D., Tang, S., Ma, X., Zhang, D.: A Core Clustering Approach for Cube Slice. Journal of Computer Research and Development, 359–365 (2006)
Google Scholar
Li, L., McCann, J., Pollard, N., Faloutsos, C.: DynaMMO: Mining and Summarization of Coevolving Sequences with missing values. In: SIGKDD (2009)
Google Scholar
Li, Q., Ma, X., Tang, S., Xie, S.: Continuously Identifying Representatives out of Massive Streams. In: Tang, J., King, I., Chen, L., Wang, J. (eds.) ADMA 2011, Part I. LNCS, vol. 7120, pp. 229–242. Springer, Heidelberg (2011)
Chapter Google Scholar
Liu, W., Zheng, Y., Chawla, S.: Discovering Spatio-temporal Causal Interactions in Traffic Data Streams. In: SIGKDD (2011)
Google Scholar
Ostfeld, A., Uber, J.G., Salomons, E.: Battle of water sensor networks: A Design Challenge for Engineers and Algorithms. In: WDSA (2006)
Google Scholar
Papadimitriou, S., Sun, J., Faloutsos, C.: Streaming Pattern Discovery in Multiple Timeseries. In: VLDB (2005)
Google Scholar
Park, H.S., Jun, C.H.: A Simple and Fast Algorithm for K-medoids Clustering. Expert Systems with Applications 36(2), 3336–3341 (2009)
Article Google Scholar
Rossman, L.A.: EPANET2 user’s manual. National Risk Management Research Laboratory: U.S. Environmental Protection Agency (2000)
Google Scholar
Xiao, H., Ma, X., Tang, S.: Continuous Summarization of Co-evolving Data in Large Water Distribution Network. In: WAIM (2010)
Google Scholar
Yeh, M., Dai, B., Chen, M.: Clustering over Multiple Evolving Streams by Events and Correlations. TKDE 19(10), 1349–1362 (2007)
Google Scholar
The Centre for Water Systems (CWS) at the University of Exeter, http://centres.exeter.ac.uk/cws/

Download references

Author information

Authors and Affiliations

Key Laboratory of Machine Perception, Peking University, Ministry of Education, China
Xiaokang Ji, Xiuli Ma, Ting Huang & Shiwei Tang
School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Xiaokang Ji, Xiuli Ma, Ting Huang & Shiwei Tang

Authors

Xiaokang Ji
View author publications
You can also search for this author in PubMed Google Scholar
Xiuli Ma
View author publications
You can also search for this author in PubMed Google Scholar
Ting Huang
View author publications
You can also search for this author in PubMed Google Scholar
Shiwei Tang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

US Air Force Office of Scientific Research, 106-0032, Tokyo, Japan
Hiroshi Motoda
School of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
Zhaohui Wu
Faculty of Engineering and Information Technology, University of Technology, Chippendale, 2008, Sydney, NSW, Australia
Longbing Cao
Department of Computing Science, University of Alberta, T6G 2E8, Edmonton, Canada
Osmar Zaiane
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Min Yao
School of Computer Science, Fudan University, 200433, Shanghai, China
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ji, X., Ma, X., Huang, T., Tang, S. (2013). Continuously Extracting High-Quality Representative Set from Massive Data Streams. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53914-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-53914-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53913-8
Online ISBN: 978-3-642-53914-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics