Skip to main content

Monitoring Distributed Data Streams through Node Clustering

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8556))

  • 2406 Accesses

Abstract

Monitoring data streams in a distributed system is a challenging problem with profound applications. The task of feature selection (e.g., by monitoring the information gain of various features) is an example of an application that requires special techniques to avoid a very high communication overhead when performed using straightforward centralized algorithms.

Motivated by recent contributions based on geometric ideas, we present an alternative approach that combines system theory techniques and clustering. The proposed approach enables monitoring values of an arbitrary threshold function over distributed data streams through a set of constraints applied independently on each stream and/or clusters of streams. The clusters are designed to adapt themselves to the data stream. A correct choice of clusters yields a reduction in communication load. Unlike many clustering algorithms that attempt to collect together similar data items, monitoring requires clusters with dissimilar vectors canceling each other as much as possible. In particular, sub–clusters of a good cluster do not have to be good. This novel type of clustering dictated by the problem at hand requires development of new algorithms, and the paper is a step in this direction.

We report experiments on real-world data that detect instances where communication between nodes is required, and show that the clustering approach reduces communication load.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Gray, R.M.: Entropy and Information Theory. Springer, New York (1990)

    Book  MATH  Google Scholar 

  2. Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. Chapman & Hall/CRC, Boca Raton (2005)

    Book  Google Scholar 

  3. Willems, J.C.: The Analysis of Feedback Systems. The MIT Press, Cambridge (1971)

    MATH  Google Scholar 

  4. Brucker, P.: On the complexity of clustering problems. Lecture Notes in Economics and Mathematical Systems, vol. 157, pp. 45–54 (1978)

    Google Scholar 

  5. Burdakis, S., Deligiannakis, A.: Detecting outliers in sensor networks using the geometric approach. In: ICDE, pp. 1108–1119 (2012)

    Google Scholar 

  6. Dilman, M., Raz, D.: Efficient reactive monitoring. In: Proceedings of the Twentieth Annual Joint Conference of the IEEE Computer and Communication Societies, pp. 1012–1019 (2001)

    Google Scholar 

  7. Gabel, M., Schuster, A., Keren, D.: Communication-efficient outlier detection for scale-out systems. In: BD3@VLDB, pp. 19–24 (2013)

    Google Scholar 

  8. Garofalakis, M., Keren, D., Samoladas, V.: Sketch-based geometric monitoring of distributed stream queries. In: PVLDB (2013)

    Google Scholar 

  9. Madden, S., Franklin, M.J.: An architecture for queries over streaming sensor data. In: ICDE 2002, p. 555 (2002)

    Google Scholar 

  10. Sharfman, I., Schuster, A., Keren, D.: A Geometric Approach to Monitoring Threshold Functions over Distributed Data Streams. ACM Transactions on Database Systems 32, 23:1–23:29 (2007)

    Google Scholar 

  11. Sharfman, I., Schuster, A., Keren, D.: A Geometric Approach to Monitoring Threshold Functions over Distributed Data Streams. In: May, M., Saitta, L. (eds.) Ubiquitous Knowledge Discovery. LNCS, vol. 6202, pp. 163–186. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Keren, D., Sharfman, I., Schuster, A., Livne, A.: Shape Sensitive Geometric Monitoring. IEEE Transactions on Knowledge and Data Engineering 24, 1520–1535 (2012)

    Article  Google Scholar 

  13. Kogan, J.: Feature Selection over Distributed Data Streams through Convex Optimization. In: Proceedings of the Twelfth SIAM International Conference on Data Mining (SDM 2012), pp. 475–484. SIAM (2012)

    Google Scholar 

  14. Kogan, J., Malinovsky, Y.: Monitoring Threshold Functions over Distributed Data Streams with Clustering. In: Proceedings of the Workshop on Data Mining for Service and Maintenance (held in conjunction with the 2013 SIAM International Conference on Data Mining), pp. 5–13 (2013)

    Google Scholar 

  15. Manjhi, A., Shkapenyuk, V., Dhamdhere, K., Olston, C.: Finding (recently) frequent items in distributed data streams. In: ICDE 2005, pp. 767–778 (2005)

    Google Scholar 

  16. Yi, B.-K., Sidiropoulos, N., Johnson, T., Jagadish, H.V., Faloutsos, C., Biliris, A.: Online datamining for co–evolving time sequences. In: ICDE 2000 (2000)

    Google Scholar 

  17. Zhu, Y., Shasha, D.: Statestream: Statistical monitoring of thousands of data streamsin real time. In: VLDB, pp. 358–369 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Barouti, M., Keren, D., Kogan, J., Malinovsky, Y. (2014). Monitoring Distributed Data Streams through Node Clustering. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2014. Lecture Notes in Computer Science(), vol 8556. Springer, Cham. https://doi.org/10.1007/978-3-319-08979-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08979-9_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08978-2

  • Online ISBN: 978-3-319-08979-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics