skip to main content
10.1145/3394486.3403144acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Sliding Sketches: A Framework using Time Zones for Data Stream Processing in Sliding Windows

Published: 20 August 2020 Publication History

Abstract

Data stream processing has become a hot issue in recent years due to the arrival of big data era. There are three fundamental stream processing tasks: membership query, frequency query and heavy hitter query. While most existing solutions address these queries in fixed windows, this paper focuses on a more challenging task: answering these queries in sliding windows. While most existing solutions address different kinds of queries by using different algorithms, this paper focuses on a generic framework. In this paper, we propose a generic framework, namely Sliding sketches, which can be applied to many existing solutions for the above three queries, and enable them to support queries in sliding windows. We apply our framework to five state-of-the-art sketches for the above three kinds of queries. Theoretical analysis and extensive experimental results show that after using our framework, the accuracy of existing sketches that do not support sliding windows becomes much higher than the corresponding best prior art. We released all the source code at Github.

References

[1]
Sang Hyun Oh, Jin Suk Kang, Yung Cheol Byun, Taikyeong T Jeong, and Won Suk Lee. Anomaly intrusion detection based on clustering a data stream. In Acis International Conference on Software Engineering Research, Management and Applications, pages 220--227, 2006.
[2]
Mustafa Amir Faisal, Zeyar Aung, John R. Williams, and Abel Sanchez. Securing advanced metering infrastructure using intrusion detection system with data stream mining. In Pacific Asia Conference on Intelligence and Security Informatics, pages 96--111, 2012.
[3]
Bryan Ball, Mark Flood, H. V. Jagadish, Joe Langsam, Louiqa Raschid, and Peratham Wiriyathammabhum. A flexible and extensible contract aggregation framework (caf) for financial data stream analytics. pages 1--6, 2014.
[4]
Lajos Gergely Gyurkó, Terry Lyons, Mark Kontkowski, and Jonathan Field. Extracting information from the signature of a financial data stream. Quantitative Finance, 2013.
[5]
Ruo Hu. Stability analysis of wireless sensor network service via data stream methods. Applied Mathematics & Information Sciences, 6(3):793--798, 2012.
[6]
Carlos M. S. Figueiredo, Carlos M. S. Figueiredo, Eduardo F. Nakamura, Luciana S. Buriol, Antonio A. F. Loureiro, Antnio Otvio Fernandes, and Claudionor J. N. Jr Coelho. Data stream based algorithms for wireless sensor network applications. In International Conference on Advanced Information NETWORKING and Applications, pages 869--876, 2007.
[7]
FPGA data sheet [on line]. http://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf.
[8]
Burton H Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970.
[9]
Graham Cormode and S Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005.
[10]
Cristian Estan and George Varghese. New directions in traffic measurement and accounting. ACM SIGMCOMM CCR, 32(4), 2002.
[11]
Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. Siam Journal on Computing, 31(6):1794--1813, 2002.
[12]
F. Chang, Wu Chang Feng, and Kang Li. Approximate caches for packet classification. In Joint Conference of the IEEE Computer and Communications Societies, pages 2196--2207 vol.4, 2004.
[13]
Rajath Subramanyam, Indranil Gupta, Luke M. Leslie, and Wenting Wang. Idempotent distributed counters using a forgetful bloom filter. Cluster Computing, 19(2):879--892, 2016.
[14]
Yoon. Aging bloom filter with two active buffers for dynamic sets. IEEE Transactions on Knowledge & Data Engineering, 22(1):134--138, 2009.
[15]
Odysseas Papapetrou, Minos Garofalakis, and Antonios Deligiannakis. Sketch-based querying of distributed sliding-window data streams. Proceedings of the VLDB Endowment, 5(10):992--1003, 2012.
[16]
Nicoló Rivetti, Yann Busnel, and Achour Mostefaoui. Efficiently Summarizing Distributed Data Streams over Sliding Windows. PhD thesis, LINA-University of Nantes; Centre de Recherche en Économie et Statistique; Inria Rennes Bretagne Atlantique, 2015.
[17]
Ho Leung Chan, Tak Wah Lam, Lap Kei Lee, and Hing Fung Ting. Continuous Monitoring of Distributed Data Streams over a Time-Based Sliding Window. 2009.
[18]
Graham Cormode and Ke Yi. Tracking distributed aggregates over time-based sliding windows. In ACM Sigact-Sigops Symposium on Principles of Distributed Computing, pages 213--214, 2011.
[19]
Ben Basat Ran, Gil Einziger, Roy Friedman, and Yaron Kassner. Heavy hitters in streams and sliding windows. In IEEE INFOCOM 2016 - the IEEE International Conference on Computer Communications, pages 1--9, 2016.
[20]
L. K. Lee and H. F. Ting. A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In ACM Sigmod-Sigact-Sigart Symposium on Principles of Database Systems, pages 290--297, 2006.
[21]
Hung, Y. S Regant, Lee, Lap-Kei, Ting, and H.F. Finding frequent items over sliding windows with constant update time. Information Processing Letters, 110(7):257--260, 2010.
[22]
Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In Automata, Languages and Programming. Springer, 2002.
[23]
Junzhi Gong, Tong Yang, Haowei Zhang, Hao Li, Steve Uhlig, Shigang Chen, Lorna Uden, and Xiaoming Li. Heavykeeper: An accurate algorithm for finding top-k elephant flows. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 909--921, Boston, MA, 2018. USENIX Association.
[24]
"source code of sliding sketches and other sketches". https://github.com/sliding-sketch/Sliding-Sketch.
[25]
David Nelson. The bloomier filter: An efficient data structure for static support lookup tables. Proc Symposium on Discrete Algorithms, 2004.
[26]
J. Aguilar-Saborit, P. Trancoso, V. Muntes-Mulero, and J. L. Larriba-Pey. Dynamic count filters. Acm Sigmod Record, 35(1):26--32, 2006.
[27]
Fang Hao, M Kodialam, T. V Lakshman, and Haoyu Song. Fast multiset membership testing using combinatorial bloom filters. In INFOCOM, pages 513--521, 2009.
[28]
Tong Yang, Alex X. Liu, Muhammad Shahzad, Yuankun Zhong, Qiaobin Fu, Zi Li, Gaogang Xie, and Xiaoming Li. A shifting bloom filter framework for set queries. Proceedings of the Vldb Endowment, 9(5):408--419, 2016.
[29]
Tong Yang, Yang Zhou, Hao Jin, Shigang Chen, and Xiaoming Li. Pyramid sketch: a sketch framework for frequency estimation of data streams. Proceedings of the Vldb Endowment, 10(11), 2017.
[30]
Pratanu Roy, Arijit Khan, and Gustavo Alonso. Augmented sketch: Faster and more accurate stream processing. In International Conference on Management of Data, pages 1449--1463, 2016.
[31]
Jiecao Chen and Qin Zhang. Bias-aware sketches. Proceedings of the VLDB Endowment, 10(9):961--972, 2017.
[32]
Tong Yang, Jie Jiang, Peng Liu, Qun Huang, Junzhi Gong, Yang Zhou, Rui Miao, Xiaoming Li, and Steve Uhlig. Elastic sketch: Adaptive and fast network-wide measurements. In ACM SIGCOMM 2018.
[33]
Yang Zhou, Tong Yang, Jie Jiang, Bin Cui, Minlan Yu, Xiaoming Li, and Steve Uhlig. Cold filter: A meta-framework for faster and more accurate stream processing. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 741--756, New York, NY, USA, 2018. ACM.
[34]
Erik D Demaine, Alejandro López-Ortiz, and J Ian Munro. Frequency estimation of internet packet streams with limited space. In European Symposium on Algorithms, pages 348--360. Springer, 2002.
[35]
Gurmeet Singh Manku and Rajeev Motwani. Approximate frequency counts over data streams. In VLDB'02: Proceedings of the 28th International Conference on Very Large Databases, pages 346--357. Elsevier, 2002.
[36]
Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. Efficient computation of frequent and top-k elements in data streams. In International Conference on Database Theory, pages 398--412. Springer, 2005.
[37]
Daniel Ting. Data sketches for disaggregated subset sum and frequent item estimation. 2017.
[38]
textCaida anonymized 2016 internet traces. http://www.caida.org/data/overview/.
[39]
Real-life transactional dataset. http://fimi.ua.ac.be/data/.
[40]
textThe Network dataset Internet Traces. http://snap.stanford.edu/data/.
[41]
Alex Rousskov and Duane Wessels. High-performance benchmarking with web polygraph. Software: Practice and Experience, 34(2):187--211, 2004.
[42]
David MW Powers. Applications and explanations of Zipf's law. In ¶roc EMNLP-CoNLL. Association for Computational Linguistics, 1998.
[43]
Arvind Arasu and Gurmeet Singh Manku. Approximate counts and quantiles over sliding windows. In ACM Sigmod-Sigact-Sigart Symposium on Principles of Database Systems, pages 286--296, 2004.

Cited By

View all
  • (2025)Expiration filter: Mining recent heavy flows in high-speed networksComputer Networks10.1016/j.comnet.2024.111010258(111010)Online publication date: Feb-2025
  • (2024)Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and QualityProceedings of the ACM on Management of Data10.1145/36771342:4(1-31)Online publication date: 30-Sep-2024
  • (2024)DPSW-Sketch: A Differentially Private Sketch Framework for Frequency Estimation over Sliding WindowsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671694(3255-3266)Online publication date: 25-Aug-2024
  • Show More Cited By

Index Terms

  1. Sliding Sketches: A Framework using Time Zones for Data Stream Processing in Sliding Windows

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
      August 2020
      3664 pages
      ISBN:9781450379984
      DOI:10.1145/3394486
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 August 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. approximate query
      2. data stream
      3. sketch
      4. sliding window

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China
      • PKU-Baidu Fund
      • National Key R&D Program of China

      Conference

      KDD '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)131
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 15 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Expiration filter: Mining recent heavy flows in high-speed networksComputer Networks10.1016/j.comnet.2024.111010258(111010)Online publication date: Feb-2025
      • (2024)Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and QualityProceedings of the ACM on Management of Data10.1145/36771342:4(1-31)Online publication date: 30-Sep-2024
      • (2024)DPSW-Sketch: A Differentially Private Sketch Framework for Frequency Estimation over Sliding WindowsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671694(3255-3266)Online publication date: 25-Aug-2024
      • (2024)HSS: A Memory-Efficient, Accurate, and Fast Network Measurement Framework in Sliding WindowsIEEE Transactions on Network and Service Management10.1109/TNSM.2024.346075121:6(5958-5976)Online publication date: Dec-2024
      • (2024)Unbiased Real-Time Traffic SketchingIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.328400411:3(2371-2383)Online publication date: May-2024
      • (2024)Priority Sketch: A Priority-aware Measurement Framework2024 International Conference on Satellite Internet (SAT-NET)10.1109/SAT-NET62854.2024.00012(18-23)Online publication date: 25-Oct-2024
      • (2024)Advancing Sketch-Based Network Measurement: A General, Fine-Grained, Bit-Adaptive Sliding Window Framework2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682923(1-10)Online publication date: 19-Jun-2024
      • (2024)Learning-Augmented Frequency Estimation in Sliding Windows2024 IEEE 32nd International Conference on Network Protocols (ICNP)10.1109/ICNP61940.2024.10858536(1-6)Online publication date: 28-Oct-2024
      • (2024)Newton Sketches: Estimating Node Intimacy in Dynamic Graphs Using Newton's Law of Cooling2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00225(2904-2916)Online publication date: 13-May-2024
      • (2024)Scalable Overspeed Item Detection in Streams2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00094(1157-1170)Online publication date: 13-May-2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media