Skip to main content

COMET: Event-Driven Clustering over Multiple Evolving Streams

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Included in the following conference series:

  • 3038 Accesses

Abstract

In this paper, we present a framework for event-driven Clustering Over Multiple Evolving sTreams, which, abbreviated as COMET, monitors the distribution of clusters on multiple data streams and online reports the results. This information is valuable to support corresponding online decisions. Note that as time advances, the data streams are evolving and the clusters they belong to will change. Instead of directly clustering the multiple data streams periodically, COMET applies an efficient cluster adjustment procedure only when it is required. The signal of requiring to do cluster adjustments is defined as an ”event.” We design a mechanism of event detection which employs piecewise linear approximation as the key technique. The piecewise linear approximation is advantageous in that it can not only be performed in real time as the data comes in, but also be able to capture the trend of data. When an event occurs, through split and merge operations we can report the latest clustering results effectively with high clustering quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proc. of PODS (2002)

    Google Scholar 

  2. Bulut, A., Singh, A.K.: SWAT: Hierarchical stream summarization in large networks. In: Proc. of ICDE (2003)

    Google Scholar 

  3. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. of ACM SIGKDD (2000)

    Google Scholar 

  4. Ganti, V., Gehrke, J., Ramakrishnan, R.: DEMON: Mining and monitoring evolving data. Knowledge and Data Engineering 13 (2001)

    Google Scholar 

  5. Ganti, V., Gehrke, J., Ramakrishnan, R.: DEMON: Mining and monitoring evolving data. Knowledge and Data Engineering 13 (2001)

    Google Scholar 

  6. Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: The Annual Symposium on Foundations of Computer Science (2000)

    Google Scholar 

  7. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. of ACM SIGKDD (2001)

    Google Scholar 

  8. O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-data algorithms for high-quality clustering. In: Proc. of ICDE (2002)

    Google Scholar 

  9. Bulut, A., Singh, A.K.: A unified framework for monitoring data streams in real time. In: Proc. of ICDE (2005)

    Google Scholar 

  10. Liu, X., Ferhatosmanoglu, H.: Efficient k-nn search on streaming data series. In: Proc. of SSTD (2003)

    Google Scholar 

  11. Zhu, Y., Shasha, D.: Statstream: Statistical monitoring of thousands of data streams in real time. In: Proc. of VLDB (2002)

    Google Scholar 

  12. Yi, B.K., Sidiropoulos, N.J.T., Jagadish, H.V., Faloutsos, C., Biliris, A.: Online data mining for co-evolving time sequences. In: Proc. of ICDE (2000)

    Google Scholar 

  13. Wu, H.B., Salzberg, D.Z.: Online event-driven subsequence matching over financial data streams. In: Proc. of ACM SIGMOD (2004)

    Google Scholar 

  14. Dai, B.R., Huang, J.W., Yeh, M.Y., Chen, M.S.: Clustering on demand for multiple data streams. In: Proc. of ICDM (2004)

    Google Scholar 

  15. Rodrigues, P., Gama, J., Pedroso, J.P.: Hierarchical time-series clustering for data streams. In: Proc. of Int’l Workshop on Knowledge Discovery in Data Streams in conjunction with 15th European Conference on Machine Learning (2004)

    Google Scholar 

  16. Yang, J.: Dynamic clustering of evolving streams with a single pass. In: Proc. of ICDE, pp. 695–697 (2003)

    Google Scholar 

  17. Keogh, E.J., Chu, S., Hart, D., Pazzani, M.J.: An online algorithm for segmenting time series. In: Proc. of ICDM (2001)

    Google Scholar 

  18. Keogh, E.J.: A fast and robust method for pattern matching in time series databases. In: Proc. of ICTAI (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yeh, MY., Dai, BR., Chen, MS. (2006). COMET: Event-Driven Clustering over Multiple Evolving Streams. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_83

Download citation

  • DOI: https://doi.org/10.1007/11731139_83

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33206-0

  • Online ISBN: 978-3-540-33207-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics