Definition
A majority of today’s data is constantly evolving and fundamentally distributed in nature. Data for almost any large-scale data-management task is continuously collected over a wide area, and at a much greater rate than ever before. Compared to traditional, centralized stream processing, querying such large-scale, evolving data collections poses new challenges, due mainly to the physical distribution of the streaming data and the communication constraints of the underlying network. Distributed stream processing algorithms should guarantee efficiency not only in terms of space and processing time (as conventional streaming techniques), but also in terms of the communication load imposed on the network infrastructure.
Historical Background
The prevailing paradigm in database systems has been understanding the management of centralizeddata: how to organize, index, access, and query data that is held centrally on a single machine or a small number of closely linked machines....
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of the 28th Annual ACM Symposium on the Theory of Computing; 1996. p. 20–9.
Babcock B, Olston C. Distributed top-K monitoring. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data; 2003.
Balazinska M, Balakrishnan H, Madden S, Stonebraker M. Fault-tolerance in the borealis distributed stream processing system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2005.
Chu D, Deshpande A, Hellerstein JM, Hong W. Approximate data collection in sensor networks using probabilistic models. In: Proceedings of the 22nd International Conference on Data Engineering; 2006.
Cormode G, Garofalakis M. Sketching streams through the net: distributed approximate query tracking. In: Proceedings of the 31st International Conference on Very Large Data Bases; 2005.
Cormode G, Muthukrishnan S, Yi K. Algorithms for distributed functional monitoring. In: Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms; 2008.
Cranor C, Johnson T, Spatscheck O, Shkapenyuk V. Gigascope: a stream database for network applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2003.
Das A, Ganguly S, Garofalakis M, Rastogi R. Distributed set-expression cardinality estimation. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004.
Flajolet P, Nigel Martin G. Probabilistic counting algorithms for data base applications. J Comput Syst Sci. 1985;31(2):182–209.
Garofalakis M, Hellerstein JM, Maniatis P. Proof sketches: verifiable in-network aggregation. In: Proceedings of the 23rd International Conference on Data Engineering; 2007.
Guestrin C, Bodik P, Thibaux R, Paskin M, Madden S. Distributed regression: an efficient framework for modeling sensor network data. Inform. Process. Sensor Networks; 2004.
Huang L, Nguyen X, Garofalakis M, Hellerstein JM, Jordan MI, Joseph AD, Taft N. Communication-efficient online detection of network-wide anomalies. In: Proceedings of the 26th Annual Joint Conference of the IEEE Computer and Communications Societies; 2007.
Jain A, Hellerstein J, Ratnasamy S, Wetherall D. A wakeup call for internet monitoring systems: The case for distributed triggers. In: Proceedings of the Third Workshop on Hot Topics in Networks; 2004.
Jain S, Fall K, Patra R. Routing in a delay tolerant network. In: Proceedings of the ACM International Conference of the on Data Communication; 2005.
Kempe D, Dobra A, Gehrke J. Gossip-based computation of aggregate information. In: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science; 2003.
Keralapura R, Cormode G, Ramamirtham J. Communication-efficient distributed monitoring of thresholded counts. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data; 2006, p. 289–300.
Loo BT, Condie T, Garofalakis M, Gay DE, Hellerstein JM, Maniatis P, Ramakrishnan R, Roscoe T, Stoica I. Declarative networking: language, execution, and optimization. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2006.
Madden S, Franklin MJ, Hellerstein JM, Hong W. TAG: a tiny aggregation service for ad-hoc sensor networks. In: Proceedings of the 5th USENIX Symposium on Operating System Design and Implementation; 2002.
Manjhi A, Nath S, Gibbons P. Tributaries and deltas: efficient and robust aggregation in sensor network streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2005.
Nath S, Gibbons PB, Seshan S, Anderson ZR. Synopsis diffusion for robust aggrgation in sensor networks. In: Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems; 2004.
Olston C, Jiang J, Widom J Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2003.
Pietzuch P, Ledlie J, Schneidman J, Roussopoulos M, Welsh M, Seltzer M. Network-aware operator placement for stream-processing systems. In: Proceedings of the 22nd International Conference on Data Engineering; 2006.
Rhea S, Godfrey B, Karp B, Kubiatowicz J, Ratnasamy S, Shenker S, Stoica I, Yu HY. OpenDHT: a public dht service and its uses. In: Proceedings of the ACM International Conference of the on Data Communication; 2005.
Rissanen J. Modeling by shortest data description. Automatica. 1978;14(5):465–71.
Shah MA, Hellerstein JM, Brewer E. Highly available, fault-tolerant, parallel dataflows. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004.
Sharfman I, Schuster A, Keren D. A geometric approach to monitoring threshold functions over distributed data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2006, p. 301–12.
Xing Y, Hwang JH, Cetintemel U, Zdonik S. Providing resiliency to load variations in ditributed stream processing. In: Proceedings of the 32nd International Conference on Very Large Data Bases; 2006.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Garofalakis, M. (2018). Distributed Data Streams. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_137
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_137
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering