skip to main content
10.1145/2335484.2335489acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article

Approximate membership query over time-decaying windows for event stream processing

Published: 16 July 2012 Publication History

Abstract

There has been a long history of finding a space-efficient data structure to support approximate membership queries, started from Bloom's work in the 1970's. Given a set A of n items and an additional item x from the same universe u of a size m > n, we want to distinguish whether xA or not, using small (limited) space. If A is static, there exist optimal algorithms to find a randomized data structure to represent A using only (1 + o(1))n log 1/δ bits, which only allows for a small false positive δ but no false negative. However, existing optimal algorithms are not practical for many event-based systems, e. g., web services, peer-to-peer systems, network traffic monitoring, etc. In these systems, items are inserted or updated dynamically in a stream of events, and we are interested in recently updated items. In this paper, we propose a novel data structure to support approximate membership queries in a time-decaying window model. In this model, items are inserted one-by-one over a data stream, and we want to determine whether an item is among the most recent w items for any given window size wn. Our data structure only requires O(n(log 1/δ + log n)) bits and O(1) running time.

References

[1]
ClarkNet HTTP traffic. http://ita.ee.lbl.gov/html/traces.html.
[2]
WIDE 150 Megabit Ethernet Trace 2008-03-18. www. caida. org/projects/ditl/summary-2008-03/.
[3]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970.
[4]
A. Broder, M. Mitzenmacher, and A. B. I. M. Mitzenmacher. Network applications of bloom filters: A survey. In Internet Mathematics, pages 636--646, 2002.
[5]
L. Carter, R. Floyd, J. Gill, G. Markowsky, and M. Wegman. Exact and approximate membership testers. In STOC '78, pages 59--65, 1978.
[6]
F. Deng and D. Rafiei. Approximately detecting duplicates for streaming data using stable bloom filters. In SIGMOD '06, pages 25--36, 2006.
[7]
D. Eppstein and M. Goodrich. Straggler identification in round-trip data streams via newton's identities and invertible bloom filters. IEEE Trans. Knowl. Data Eng., 23(2):297--306, 2011.
[8]
L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw., 8:281--293, June 2000.
[9]
A. Metwally, D. Agrawal, and A. El Abbadi. Duplicate detection in click streams. In WWW '05, pages 12--21, 2005.
[10]
R. Pagh and F. F. Rodler. Cuckoo hashing. J. Algorithms, 51:122--144, May 2004.
[11]
E. Porat. An optimal bloom filter replacement based on matrix solving. In CSR '09, pages 263--273, 2009.
[12]
B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in facebook. In WOSN '09, pages 37--42, 2009.
[13]
L. Zhang and Y. Guan. Detecting click fraud in pay-per-click streams of online advertising networks. In ICDCS '08, pages 77--84, 2008.

Index Terms

  1. Approximate membership query over time-decaying windows for event stream processing

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      DEBS '12: Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
      July 2012
      410 pages
      ISBN:9781450313155
      DOI:10.1145/2335484
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 July 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. algorithm
      2. data stream
      3. membership query

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      DEBS '12

      Acceptance Rates

      Overall Acceptance Rate 145 of 583 submissions, 25%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 209
        Total Downloads
      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media