skip to main content
10.1145/1142473.1142508acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

A geometric approach to monitoring threshold functions over distributed data streams

Published: 27 June 2006 Publication History

Abstract

Monitoring data streams in a distributed system is the focus of much research in recent years. Most of the proposed schemes, however, deal with monitoring simple aggregated values, such as the frequency of appearance of items in the streams. More involved challenges, such as the important task of feature selection (e.g., by monitoring the information gain of various features), still require very high communication overhead using naive, centralized algorithms. We present a novel geometric approach by which an arbitrary global monitoring task can be split into a set of constraints applied locally on each of the streams. The constraints are used to locally filter out data increments that do not affect the monitoring outcome, thus avoiding unnecessary communication. As a result, our approach enables monitoring of arbitrary threshold functions over distributed data streams in an efficient manner. We present experimental results on real-world data which demonstrate that our algorithms are highly scalable, and considerably reduce communication load in comparison to centralized algorithms.

References

[1]
{1} N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In STOC '96, pages 20-29, New York, NY, USA, 1996. ACM Press.
[2]
{2} A. Arasu and G. S. Manku. Approximate counts and quantiles over sliding windows. In PODS '04, pages 286-296, New York, NY, USA, 2004. ACM Press.
[3]
{3} B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS '02, pages 1-16, New York, NY, USA, 2002. ACM Press.
[4]
{4} B. Babcock and C. Olston. Distributed top-k monitoring. In SIGMOD '03, pages 28-39, New York, NY, USA, 2003. ACM Press.
[5]
{5} S. Babu and J. Widom. Continuous queries over data streams. SIGMOD Rec., 30(3):109-120, 2001.
[6]
{6} L. Berkovitz. Convexity and Optimization in Rn. Wiley, 2002.
[7]
{7} A. Bulut, A. K. Singh, and R. Vitenberg. Distributed data streams indexing using content-based routing paradigm. In IPDPS. IEEE Computer Society, 2005.
[8]
{8} D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. B. Zdonik. Monitoring streams - a new class of data management applications. In VLDB, 2002.
[9]
{9} M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In ICALP '02, pages 693-703, London, UK, 2002. Springer-Verlag.
[10]
{10} M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, and S. Zdonik. Scalable Distributed Stream Processing. In CIDR 2003, Asilomar, CA, January 2003.
[11]
{11} G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi. Holistic aggregates in a networked world: distributed tracking of approximate quantiles. In SIGMOD '05, pages 25-36, New York, NY, USA, 2005. ACM Press.
[12]
{12} G. Cormode, R. Keralapura, and J. Ramimirtham. Communication-efficient distributed monitoring of thresholded counts. In SIGMOD '06, 2006.
[13]
{13} M. Dilman and D. Raz. Efficient reactive monitoring. In INFOCOM, pages 1012-1019, 2001.
[14]
{14} P. B. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In SPAA '01, pages 281-291, New York, NY, USA, 2001. ACM Press.
[15]
{15} P. B. Gibbons and S. Tirthapura. Distributed streams algorithms for sliding windows. In SPAA '02, pages 63-72, New York, NY, USA, 2002. ACM Press.
[16]
{16} D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361-397, 2004.
[17]
{17} L. Liu, C. Pu, and W. Tang. Continual queries for internet scale event-driven information delivery. IEEE Transactions on Knowledge and Data Engineering, 11(4):610-628, 1999.
[18]
{18} S. Madden and M. J. Franklin. Fjording the stream: An architecture for queries over streaming sensor data. In ICDE '02, page 555, Washington, DC, USA, 2002. IEEE Computer Society.
[19]
{19} S. Madden, M. Shah, J. M. Hellerstein, and V. Raman. Continuously adaptive continuous queries over streams. In SIGMOD '02, pages 49-60, New York, NY, USA, 2002. ACM Press.
[20]
{20} A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. Finding (recently) frequent items in distributed data streams. In ICDE '05, pages 767-778, Washington, DC, USA, 2005. IEEE Computer Society.
[21]
{21} G. S. Manku and R. Motwani. Approximate frequency counts over data streams. In VLDB, pages 346-357, 2002.
[22]
{22} R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. Query processing, resource management, and approximation in a data stream management system. In CIDR, pages 245-256, Asilomar, California, Jan. 2003.
[23]
{23} C. Olston, J. Jiang, and J. Widom. Adaptive filters for continuous queries over distributed data streams. In SIGMOD '03, pages 563-574, New York, NY, USA, 2003. ACM Press.
[24]
{24} T. Rose, M. Stevenson, and M. Whitehead. The Reuters Corpus Volume 1 - from Yesterday's News to Tomorrow's Language Resources. In LREC-02, Las Palmas de Gran Canaria, May 2002.
[25]
{25} D. Terry, D. Goldberg, D. Nichols, and B. Oki. Continuous queries over append-only databases. In SIGMOD '92, New York, NY, USA, 1992. ACM Press.
[26]
{26} B.-K. Yi, N. Sidiropoulos, T. Johnson, H. V. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for co-evolving time sequences. In ICDE '00, page 13, Washington, DC, USA, 2000. IEEE Computer Society.
[27]
{27} Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB, pages 358-369, 2002.

Cited By

View all
  • (2024)Data-driven Synchronization Protocols for Data-parallel Neural Learning over Streaming Data2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825830(988-997)Online publication date: 15-Dec-2024
  • (2021)At-the-time and Back-in-time Persistent SketchesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452802(1623-1636)Online publication date: 9-Jun-2021
  • (2021)A Distance-Based Scheme for Reducing Bandwidth in Distributed Geometric Monitoring2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00105(1164-1175)Online publication date: Apr-2021
  • Show More Cited By

Index Terms

  1. A geometric approach to monitoring threshold functions over distributed data streams

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
    June 2006
    830 pages
    ISBN:1595934340
    DOI:10.1145/1142473
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data streams
    2. distributed monitoring

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS06
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Data-driven Synchronization Protocols for Data-parallel Neural Learning over Streaming Data2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825830(988-997)Online publication date: 15-Dec-2024
    • (2021)At-the-time and Back-in-time Persistent SketchesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452802(1623-1636)Online publication date: 9-Jun-2021
    • (2021)A Distance-Based Scheme for Reducing Bandwidth in Distributed Geometric Monitoring2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00105(1164-1175)Online publication date: Apr-2021
    • (2019)Continuous Monitoring meets Synchronous Transmissions and In-Network Aggregation2019 15th International Conference on Distributed Computing in Sensor Systems (DCOSS)10.1109/DCOSS.2019.00043(157-166)Online publication date: May-2019
    • (2019)Enhancing distributed functional monitoring with quantum protocolsQuantum Information Processing10.1007/s11128-019-2484-218:12Online publication date: 30-Oct-2019
    • (2018)Lightweight Monitoring of Distributed StreamsACM Transactions on Database Systems10.1145/322611343:2(1-37)Online publication date: 31-Jul-2018
    • (2018)Recent Advancements in Event ProcessingACM Computing Surveys10.1145/317043251:2(1-36)Online publication date: 13-Feb-2018
    • (2018)Geometric Monitoring in Action: a Systems Perspective for the Internet of Things2018 IEEE 43rd Conference on Local Computer Networks (LCN)10.1109/LCN.2018.8638079(433-436)Online publication date: Oct-2018
    • (2018)Self-adaptive cloud monitoring with online anomaly detectionFuture Generation Computer Systems10.1016/j.future.2017.09.06780:C(89-101)Online publication date: 1-Mar-2018
    • (2018)Monitoring distributed fragmented skylinesDistributed and Parallel Databases10.1007/s10619-018-7223-736:4(675-715)Online publication date: 1-Dec-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media