Skip to main content

Types of Stream Processing Algorithms

Encyclopedia of Big Data Technologies
  • 328 Accesses

Synonyms

One-pass algorithms; On-line algorithms

Definition

A stream processing algorithm operates over a continuous and potentially unbounded stream of data, arriving at a possibly very high speed, one item or one batch of items at a time, and does so in limited time per item and using limited working storage. At any point in time, a stream algorithm can produce an answer over the prefix of the stream observed so far or over a sliding window of recent data. Stream processing algorithms are used to answer continuous queries, also known as standing queries.

Stream processing algorithms can be categorized according to (1) what output they compute (e.g., what function is being computed, is the answer exact or approximate) and (2) how they compute the output (e.g., sampling vs. hashing, single-threaded vs. distributed, one-pass vs. several passes).

Overview

Stream processing algorithms operate sequentially over unbounded input streams and produce output streams. The input stream is assumed...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Agarwal PK, Cormode G, Huang Z, Phillips JM, Wei Z, Yi K (2013) Mergeable summaries. ACM Trans Database Syst 38(4):26:1–26:28

    Google Scholar 

  • Akidau T, Balikov A, Bekiroglu K, Chernyak S, Haberman J, Lax R, McVeety S, Mills D, Nordstrom P, Whittle S (2013) Millwheel: Fault-tolerant stream processing at internet scale. PVLDB 6(11):1033–1044

    Google Scholar 

  • Arasu A, Manku GS (2004) Approximate counts and quantiles over sliding windows. In: Proceedings of the twenty-third ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, 14–16 June 2004, Paris, pp 286–296

    Google Scholar 

  • Arasu A, Widom J (2004) Resource sharing in continuous sliding-window aggregates. In: (e)Proceedings of the thirtieth international conference on very large data bases, Toronto, 31 Aug–3 Sept 2004, pp 336–347

    Google Scholar 

  • Babcock B, Olston C (2003) Distributed top-k monitoring. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, 9–12 June 2003, pp 28–39

    Google Scholar 

  • Babcock B, Datar M, Motwani R (2002) Sampling from a moving window over streaming data. In: Proceedings of the thirteenth annual ACM-SIAM symposium on discrete algorithms, 6–8 Jan 2002, San Francisco, pp 633–634

    Google Scholar 

  • Babcock B, Datar M, Motwani R (2004) Load shedding for aggregation queries over data streams. In: Proceedings of the 20th international conference on data engineering, ICDE 2004, 30 Mar–2 Apr 2004, Boston, pp 350–361

    Google Scholar 

  • Braverman V, Ostrovsky R, Zaniolo C (2012) Optimal sampling from sliding windows. J Comput Syst Sci 78(1):260–272

    Google Scholar 

  • Bulut A, Singh AK (2005) A unified framework for monitoring data streams in real time. In: Proceedings of the 21st international conference on data engineering, ICDE 2005, 5–8 Apr 2005, Tokyo, pp 44–55

    Google Scholar 

  • Charikar M, Chen KC, Farach-Colton M (2002) Finding frequent items in data streams. In: Proceedings of 29th international colloquium automata, languages and programming, ICALP 2002, Malaga, 8–13 July 2002, pp 693–703

    Google Scholar 

  • Cormode G (2017) Data sketching. Commun ACM 60(9):48–55

    Google Scholar 

  • Cormode G, Hadjieleftheriou M (2010) Methods for finding frequent items in data streams. VLDB J 19(1):3–20

    Google Scholar 

  • Cormode G, Muthukrishnan S (2005) An improved data stream summary: the count-min sketch and its applications. J Algorithm 55(1):58–75

    Google Scholar 

  • Cormode G, Muthukrishnan S, Yi K, Zhang Q (2012) Continuous sampling from distributed streams. J ACM 59(2):10:1–10:25

    Google Scholar 

  • Cranor CD, Johnson T, Spatscheck O, Shkapenyuk V (2003) Gigascope: a stream database for network applications. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, 9–12 June 2003, pp 647–651

    Google Scholar 

  • Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. SIAM J Comput 31(6):1794–1813

    Google Scholar 

  • Durand M, Flajolet P (2003) Loglog counting of large cardinalities (extended abstract). In: Proceedings of 11th annual European symposium algorithms – ESA 2003, Budapest, 16–19 Sept 2003, pp 605–617

    Google Scholar 

  • Flajolet P, Martin GN (1983) Probabilistic counting. In: 24th annual symposium on foundations of computer science, Tucson, 7–9 Nov 1983, pp 76–82

    Google Scholar 

  • Flajolet P, Fusy E, Gandouet O, Meunier F (2007) Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In: Proceedings of the conference on analysis of algorithms, pp 127–146

    Google Scholar 

  • Golab L, Özsu MT (2003) Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of 29th international conference on very large data bases VLDB 2003, 9–12 Sept 2003, Berlin, pp 500–511

    Google Scholar 

  • Greenwald M, Khanna S (2001) Space-efficient online computation of quantile summaries. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, Santa Barbara, 21–24 May 2001, pp 58–66

    Google Scholar 

  • Haas PJ (2016) Data-stream sampling: basic techniques and results. In: Garofalakis M, Gehrke J, Rastogi R (eds) Data stream management – processing high-speed data streams. Springer, Heidelberg, pp 13–44

    Google Scholar 

  • Kang J, Naughton JF, Viglas S (2003) Evaluating window joins over unbounded streams. In: Proceedings of the 19th international conference on data engineering, 5–8 Mar 2003, Bangalore, pp 341–352

    Google Scholar 

  • Krishnamurthy S, Wu C, Franklin MJ (2006) On-the-fly sharing for streamed aggregation. In: Proceedings of the ACM SIGMOD international conference on management of data, Chicago, 27–29 June 2006, pp 623–634

    Google Scholar 

  • Kulkarni S, Bhagat N, Fu M, Kedigehalli V, Kellogg C, Mittal S, Patel JM, Ramasamy K, Taneja S (2015) Twitter heron: stream processing at scale. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, Melbourne, 31 May–4 June 2015, pp 239–250

    Google Scholar 

  • Lee L, Ting HF (2006) A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: Proceedings of the twenty-fifth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, 26–28 June 2006, Chicago, pp 290–297

    Google Scholar 

  • Li J, Maier D, Tufte K, Papadimos V, Tucker PA (2005) No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD Rec 34(1):39–44

    Google Scholar 

  • Liu X, Golab L, Golab WM, Ilyas IF, Jin S (2017) Smart meter data analytics: Systems, algorithms, and benchmarking. ACM Trans Database Syst 42(1):2: 1–2:39

    Google Scholar 

  • Madden S, Franklin MJ (2002) Fjording the stream: an architecture for queries over streaming sensor data. In: Proceedings of the 18th international conference on data engineering, San Jose, 26 Feb–1 Mar 2002, pp 555–566

    Google Scholar 

  • Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: VLDB 2002, Proceedings of 28th international conference on very large data bases, 20–23 Aug 2002, Hong Kong, pp 346–357

    Google Scholar 

  • Metwally A, Agrawal D, El Abbadi A (2005) Efficient computation of frequent and top-k elements in data streams. In: Proceedings of 10th international conference on database theory – ICDT 2005, Edinburgh, 5–7 Jan 2005, pp 398–412

    Google Scholar 

  • Misra J, Gries D (1982) Finding repeated elements. Sci Comput Program 2(2):143–152

    Google Scholar 

  • Nasir MAU, Morales GDF, García-Soriano D, Kourtellis N, Serafini M (2015) The power of both choices: Practical load balancing for distributed stream processing engines. In: 31stIEEE international conference on data engineering, ICDE 2015, Seoul, 13–17 Apr 2015, pp 137–148

    Google Scholar 

  • Olston C, Jiang J, Widom J (2003) Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, 9–12 June 2003, pp 563–574

    Google Scholar 

  • Stonebraker M, Çetintemel U, Zdonik SB (2005) The 8 requirements of real-time stream processing. SIGMOD Rec 34(4):42–47

    Google Scholar 

  • Tatbul N, Çetintemel U, Zdonik SB, Cherniack M, Stonebraker M (2003) Load shedding in a data stream manager. In: VLDB 2003, Proceedings of 29th international conference on very large data bases, 9–12 Sept 2003, Berlin, pp 309–320

    Google Scholar 

  • Teubner J, Müller R (2011) How soccer players would do stream joins. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2011, Athens, 12–16 June 2011, pp 625–636

    Google Scholar 

  • Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11(1):37–57

    Google Scholar 

  • Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I (2013) Discretized streams: fault-tolerant streaming computation at scale. In: ACM SIGOPS

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lukasz Golab .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Golab, L. (2018). Types of Stream Processing Algorithms. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_193-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_193-1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Chapter history

  1. Latest

    Types of Stream Processing Algorithms
    Published:
    26 November 2022

    DOI: https://doi.org/10.1007/978-3-319-63962-8_193-3

  2. Types of Stream Processing Algorithms
    Published:
    23 April 2018

    DOI: https://doi.org/10.1007/978-3-319-63962-8_193-1

  3. Original

    Types of Stream Processing Algorithms
    Published:
    24 February 2012

    DOI: https://doi.org/10.1007/978-3-319-63962-8_193-2