Skip to main content

Continuously Extracting High-Quality Representative Set from Massive Data Streams

  • Conference paper
Advanced Data Mining and Applications (ADMA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8346))

Included in the following conference series:

  • 2356 Accesses

Abstract

In many large-scale real-time monitoring applications, hundreds or thousands of streams should be continuously monitored. To ease the monitoring, a small set of representatives can be extracted to represent all the streams. To get a high-quality representative set, not only representativeness but also its stability should be guaranteed. In this paper, we propose a method to continuously extract high-quality representative set from massive streams. First, we cluster streams based on core clustering model. The tightness of core set, which means any two streams in core set are highly correlated, ensures high representativeness of representative set; second, we use topological relationship to force each cluster to be connected in the network where streams are generated from. Because streams in one cluster are driven by similar underlying mechanisms, so the representative set becomes much more stable. By utilizing the tightness of core sets, we can get representative set immediately. Moreover, with local optimization strategies, our method can adjust core clusters very efficiently, which enables real-time response. Experiments on real applications illustrate that our method is efficient and produces high-quality representative set.

This work is supported by the National Natural Science Foundation of China under Grant No.61103025.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ackermann, M.R., Märtens, M., Raupach, C., et al.: StreamKM++: A Clustering Algorithm for Data Streams. Journal of Experimental Algorithmics (JEA) 17(1), 2–4 (2012)

    Google Scholar 

  2. Aggarwal, C.C., Han, J., Wang, J., et al.: A Framework for Clustering Evolving Data Streams. In: VLDB (2003)

    Google Scholar 

  3. Cheng, J., Ke, Y., Chu, S., et al.: Efficient Core Decomposition in Massive Networks. In: ICDE (2011)

    Google Scholar 

  4. Cheng, J., Zhu, L., Ke, Y., et al.: Fast Algorithms for Maximal Clique Enumeration with Limited Memory. In: SIGKDD (2012)

    Google Scholar 

  5. Jiang, L., Yang, D., Tang, S., Ma, X., Zhang, D.: A Core Clustering Approach for Cube Slice. Journal of Computer Research and Development, 359–365 (2006)

    Google Scholar 

  6. Li, L., McCann, J., Pollard, N., Faloutsos, C.: DynaMMO: Mining and Summarization of Coevolving Sequences with missing values. In: SIGKDD (2009)

    Google Scholar 

  7. Li, Q., Ma, X., Tang, S., Xie, S.: Continuously Identifying Representatives out of Massive Streams. In: Tang, J., King, I., Chen, L., Wang, J. (eds.) ADMA 2011, Part I. LNCS, vol. 7120, pp. 229–242. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Liu, W., Zheng, Y., Chawla, S.: Discovering Spatio-temporal Causal Interactions in Traffic Data Streams. In: SIGKDD (2011)

    Google Scholar 

  9. Ostfeld, A., Uber, J.G., Salomons, E.: Battle of water sensor networks: A Design Challenge for Engineers and Algorithms. In: WDSA (2006)

    Google Scholar 

  10. Papadimitriou, S., Sun, J., Faloutsos, C.: Streaming Pattern Discovery in Multiple Timeseries. In: VLDB (2005)

    Google Scholar 

  11. Park, H.S., Jun, C.H.: A Simple and Fast Algorithm for K-medoids Clustering. Expert Systems with Applications 36(2), 3336–3341 (2009)

    Article  Google Scholar 

  12. Rossman, L.A.: EPANET2 user’s manual. National Risk Management Research Laboratory: U.S. Environmental Protection Agency (2000)

    Google Scholar 

  13. Xiao, H., Ma, X., Tang, S.: Continuous Summarization of Co-evolving Data in Large Water Distribution Network. In: WAIM (2010)

    Google Scholar 

  14. Yeh, M., Dai, B., Chen, M.: Clustering over Multiple Evolving Streams by Events and Correlations. TKDE 19(10), 1349–1362 (2007)

    Google Scholar 

  15. The Centre for Water Systems (CWS) at the University of Exeter, http://centres.exeter.ac.uk/cws/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ji, X., Ma, X., Huang, T., Tang, S. (2013). Continuously Extracting High-Quality Representative Set from Massive Data Streams. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53914-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-53914-5_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-53913-8

  • Online ISBN: 978-3-642-53914-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics