Synonyms
Definition
A stream processing algorithm operates over a continuous and potentially unbounded stream of data, arriving at a possibly very high speed, one item or one batch of items at a time, and does so in limited time per item and using limited working storage. At any point in time, a stream algorithm can produce an answer over the prefix of the stream observed so far or over a sliding window of recent data. Stream processing algorithms are used to answer continuous queries, also known as standing queries.
Stream processing algorithms can be categorized according to (1) what output they compute (e.g., what function is being computed, is the answer exact or approximate) and (2) how they compute the output (e.g., sampling vs. hashing, single-threaded vs. distributed, one-pass vs. several passes).
Overview
Stream processing algorithms operate sequentially over unbounded input streams and produce output streams. The input stream is assumed...
References
Agarwal PK, Cormode G, Huang Z, Phillips JM, Wei Z, Yi K (2013) Mergeable summaries. ACM Trans Database Syst 38(4):26:1–26:28
Akidau T, Balikov A, Bekiroglu K, Chernyak S, Haberman J, Lax R, McVeety S, Mills D, Nordstrom P, Whittle S (2013) Millwheel: Fault-tolerant stream processing at internet scale. PVLDB 6(11):1033–1044
Arasu A, Manku GS (2004) Approximate counts and quantiles over sliding windows. In: Proceedings of the twenty-third ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, 14–16 June 2004, Paris, pp 286–296
Arasu A, Widom J (2004) Resource sharing in continuous sliding-window aggregates. In: (e)Proceedings of the thirtieth international conference on very large data bases, Toronto, 31 Aug–3 Sept 2004, pp 336–347
Babcock B, Olston C (2003) Distributed top-k monitoring. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, 9–12 June 2003, pp 28–39
Babcock B, Datar M, Motwani R (2002) Sampling from a moving window over streaming data. In: Proceedings of the thirteenth annual ACM-SIAM symposium on discrete algorithms, 6–8 Jan 2002, San Francisco, pp 633–634
Babcock B, Datar M, Motwani R (2004) Load shedding for aggregation queries over data streams. In: Proceedings of the 20th international conference on data engineering, ICDE 2004, 30 Mar–2 Apr 2004, Boston, pp 350–361
Braverman V, Ostrovsky R, Zaniolo C (2012) Optimal sampling from sliding windows. J Comput Syst Sci 78(1):260–272
Bulut A, Singh AK (2005) A unified framework for monitoring data streams in real time. In: Proceedings of the 21st international conference on data engineering, ICDE 2005, 5–8 Apr 2005, Tokyo, pp 44–55
Charikar M, Chen KC, Farach-Colton M (2002) Finding frequent items in data streams. In: Proceedings of 29th international colloquium automata, languages and programming, ICALP 2002, Malaga, 8–13 July 2002, pp 693–703
Cormode G (2017) Data sketching. Commun ACM 60(9):48–55
Cormode G, Hadjieleftheriou M (2010) Methods for finding frequent items in data streams. VLDB J 19(1):3–20
Cormode G, Muthukrishnan S (2005) An improved data stream summary: the count-min sketch and its applications. J Algorithm 55(1):58–75
Cormode G, Muthukrishnan S, Yi K, Zhang Q (2012) Continuous sampling from distributed streams. J ACM 59(2):10:1–10:25
Cranor CD, Johnson T, Spatscheck O, Shkapenyuk V (2003) Gigascope: a stream database for network applications. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, 9–12 June 2003, pp 647–651
Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. SIAM J Comput 31(6):1794–1813
Durand M, Flajolet P (2003) Loglog counting of large cardinalities (extended abstract). In: Proceedings of 11th annual European symposium algorithms – ESA 2003, Budapest, 16–19 Sept 2003, pp 605–617
Flajolet P, Martin GN (1983) Probabilistic counting. In: 24th annual symposium on foundations of computer science, Tucson, 7–9 Nov 1983, pp 76–82
Flajolet P, Fusy E, Gandouet O, Meunier F (2007) Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In: Proceedings of the conference on analysis of algorithms, pp 127–146
Golab L, Özsu MT (2003) Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of 29th international conference on very large data bases VLDB 2003, 9–12 Sept 2003, Berlin, pp 500–511
Greenwald M, Khanna S (2001) Space-efficient online computation of quantile summaries. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, Santa Barbara, 21–24 May 2001, pp 58–66
Haas PJ (2016) Data-stream sampling: basic techniques and results. In: Garofalakis M, Gehrke J, Rastogi R (eds) Data stream management – processing high-speed data streams. Springer, Heidelberg, pp 13–44
Kang J, Naughton JF, Viglas S (2003) Evaluating window joins over unbounded streams. In: Proceedings of the 19th international conference on data engineering, 5–8 Mar 2003, Bangalore, pp 341–352
Krishnamurthy S, Wu C, Franklin MJ (2006) On-the-fly sharing for streamed aggregation. In: Proceedings of the ACM SIGMOD international conference on management of data, Chicago, 27–29 June 2006, pp 623–634
Kulkarni S, Bhagat N, Fu M, Kedigehalli V, Kellogg C, Mittal S, Patel JM, Ramasamy K, Taneja S (2015) Twitter heron: stream processing at scale. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, Melbourne, 31 May–4 June 2015, pp 239–250
Lee L, Ting HF (2006) A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: Proceedings of the twenty-fifth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, 26–28 June 2006, Chicago, pp 290–297
Li J, Maier D, Tufte K, Papadimos V, Tucker PA (2005) No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD Rec 34(1):39–44
Liu X, Golab L, Golab WM, Ilyas IF, Jin S (2017) Smart meter data analytics: Systems, algorithms, and benchmarking. ACM Trans Database Syst 42(1):2: 1–2:39
Madden S, Franklin MJ (2002) Fjording the stream: an architecture for queries over streaming sensor data. In: Proceedings of the 18th international conference on data engineering, San Jose, 26 Feb–1 Mar 2002, pp 555–566
Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: VLDB 2002, Proceedings of 28th international conference on very large data bases, 20–23 Aug 2002, Hong Kong, pp 346–357
Metwally A, Agrawal D, El Abbadi A (2005) Efficient computation of frequent and top-k elements in data streams. In: Proceedings of 10th international conference on database theory – ICDT 2005, Edinburgh, 5–7 Jan 2005, pp 398–412
Misra J, Gries D (1982) Finding repeated elements. Sci Comput Program 2(2):143–152
Nasir MAU, Morales GDF, GarcÃa-Soriano D, Kourtellis N, Serafini M (2015) The power of both choices: Practical load balancing for distributed stream processing engines. In: 31stIEEE international conference on data engineering, ICDE 2015, Seoul, 13–17 Apr 2015, pp 137–148
Olston C, Jiang J, Widom J (2003) Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, 9–12 June 2003, pp 563–574
Stonebraker M, Çetintemel U, Zdonik SB (2005) The 8 requirements of real-time stream processing. SIGMOD Rec 34(4):42–47
Tatbul N, Çetintemel U, Zdonik SB, Cherniack M, Stonebraker M (2003) Load shedding in a data stream manager. In: VLDB 2003, Proceedings of 29th international conference on very large data bases, 9–12 Sept 2003, Berlin, pp 309–320
Teubner J, Müller R (2011) How soccer players would do stream joins. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2011, Athens, 12–16 June 2011, pp 625–636
Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11(1):37–57
Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I (2013) Discretized streams: fault-tolerant streaming computation at scale. In: ACM SIGOPS
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this entry
Cite this entry
Golab, L. (2018). Types of Stream Processing Algorithms. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_193-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-63962-8_193-1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering
Publish with us
Chapter history
-
Latest
Types of Stream Processing Algorithms- Published:
- 26 November 2022
DOI: https://doi.org/10.1007/978-3-319-63962-8_193-3
-
Types of Stream Processing Algorithms
- Published:
- 23 April 2018
DOI: https://doi.org/10.1007/978-3-319-63962-8_193-1
-
Original
Types of Stream Processing Algorithms- Published:
- 24 February 2012
DOI: https://doi.org/10.1007/978-3-319-63962-8_193-2