Abstract
This chapter addresses some of the problems raised by the high-volume, nonterminating nature of many data streams. We begin by outlining challenges for query processing over such streams, such as outstripping CPU or memory resources, operators that wait for the end of input and unbounded query state. We then consider various techniques for meeting those challenges. Filtering attempts to reduce stream volume in order to save on system resources. Punctuations incorporate semantics on the structure of a stream into the stream itself, and can help unblock query operators and reduce the state they must retain. Windowing modifies a query so that processing takes place on finite subsets of full streams. Synopses are compact, efficiently maintained summaries of data that can provide approximate answers to particular queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alon, N., Gibbons, P., Matias, Y, and Szegedy, M. (1999). Tracking join and self-join sizes in limited storage. In Proceedings of ACM PODS Conference, pages 10–20.
Alon, N., Matias, Y., and Szegedy, M. (1996). The space complexity of approximating the frequency moments. In Proceeding of ACMSTOC Conference, pages 20–29.
Arasu, A., Babu, S., and Widom, J. (2003). The CQL continuous query language: semantic foundations and query execution. Stanford University TR No. 2003-67 (unpublished).
Arasu, A. and Manku, G. S. (2004). Approximate counts and quantiles over sliding windows. In Proceedings of ACM PODS Conference, pages 286–296.
Babu, S., Srivastava, U., and Widom, J. (2004). Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. TODS, 29(3):545–580.
Carney, D., Cetintemel, Ugur, Chemiack, Mitch, Convey, Christian, Lee, Sangdon, Seidman, Greg, Stonebraker, Michael, Tatbul, Nesime, and Zdonik, Stanley B. (2002). Monitoring Streams-A New Class of Data Management Applications. In VLDB Conference, pages 215–226.
Charikar, M., Chen, K., and Farach-Colton, M. (2002). Finding frequent items in data streams. In Proceedings of ICALP Conference, pages 3–15.
Cisco Systems. (2001). Netflow Services Solutions Guide.
Considine, J., Li, F., Kollios, G., and Byers, J. (2004). Approximate aggregation techniques for sensor databases. In Proceedings of IEEEICDE Conference, pages 449–460.
Das, A., Gehrke, J., and Riedewald, M. (2003). Approximate join processing over data streams. In Proceedings of ACMSIGMOD Conference, pages 40–51.
Das, A., Riedewald, M., and Gehrke, J. (2004). Approximation techniques for spatial data. In Proceedings of ACMSIGMOD Conference, pages 695–706.
Datar, M., Gionis, A., Indyk, P., and Motwani, R. (2002). Maintaining Stream Statistics over Sliding Windows. In Proceedings of SODA Conference, pages 635–644.
Dobra, Alin, Garofalakis, Minos, Gehrke, Johannes, and Rastogi, Rajeev (2002). Processing Complex Aggregate Queries over Data Streams. In Proceedings of ACMSIGMOD Conference, pages 61–72.
Dobra, Alin, Garofalakis, Minos, Gehrke, Johannes, and Rastogi, Rajeev (2004). Sketch-Based Multi-Query Processing over Data Streams. In Proceedings of EDBT Conference, pages 551–568.
Feigenbaum, J., Kannan, S., Strauss, M., and Viswanathan, M. (1999). An approximate L1-difference algorithm for massive data streams. In Proc. IEEE FOCS Conference, page 501.
Flajolet, P. and Martin, N. (1995). Probabilistic counting algorithms for data base applications. JCSS Journal, 31(2): 182–209.
Ganguly, S., Garofalakis, M., and Rastogi, R. (2003). Processing set expressions over continuous update streams. In Proceedings of ACMSIGMOD Conference, pages 265–276.
Garofalakis, M. and Kumar, A. (2003). Correlating XML data streams using tree-edit distance embeddings. In Proceedings of ACM PODS Conference, pages 143–154.
Gehrke, J., Korn, F., and Srivastava, D. (2001). On computing correlated aggregates over continual data streams. In Proceedings of ACM SIGMOD Conference, pages 13–24.
Gibbons, P. (2001). Distinct sampling for highly-accurate answers to distinct values queries and event reports. In Proceedings of VLDB Conference, pages 541–550.
Gibbons, P. and Tirthapura, S. (2002). Distributed streams algorithms for sliding windows. In Proceedings of ACM SPAA Conference, pages 63–72.
Gilbert, A. C., Kotidis, Y., Muthukrishnan, S., and Strauss, M. (2001). Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In Proceedings of VLDB Conference, pages 79–88.
Gilbert, A. C., Guha, S., Indyk, P., Kotidis, Y, Muthukrishnan, S., and Strauss, M. (2002). Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of ACM STOC Conference, pages 389–398.
Greenwald, M. B. and Khanna, S. (2001). Space-efficient online computation of quantile summaries. In Proceedings of ACM SIGMOD Conference, pages 58–66.
Hillston, J. and Kloul, L. (2001). Performance investigation of an on-line auction system. Concurrency and Computation: Practice and Experience, 13:23–41.
Indyk, P. (2000). Stable Distributions, Pseudorandom generators, embeddings, and data stream computation. In Proceedings of IEEEFOCS Conference, page 189.
Johnson, T., Cranor, C, Spatscheck, O., and Shkapenyuk, V. (2003). Gigascope: A stream database for network applications. In Proceedings of ACM SIGMOD Conference, pages 647–651.
Kang, J., Naughton, J. F., and Viglas, S. D. (2003). Evaluating window joins over unbounded streams. In Proceedings of the International Conference on Data Engineering (ICDE).
Manku, G. S. and Motwani, R. (2002). Approximate frequency counts over data streams. In Proceedings of VLDB Conference, pages 346–357.
Rajasekar, A., Vernon, F., Hansen, T., Linquist, K., and Orcutt, J. (2004). Virtual object ring buffer: A framework for real-time data grid. In Proceedings of HDPC Conference.
Reiss, F. and Hellerstein, J. M. (2004). Data triage: An adaptive architecture for load shedding in TelegraphCQ. Intel Research Berkeley Report IRB-TR-04-004.
Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., and Stonebraker, M. (2003). Load shedding in a data stream manager. In Proceedings of VLDB Conference, pages 309–320.
Tucker, P. A. and Maier, D. (2003). Dealing with disorder. In MPDS Workshop.
Tucker, P. A., Maier, D., Fegaras, L., and Sheard, T. (2003). Exploiting punctuation semantics in continuous data streams. IEEE TKDE, 15(3):555–568.
Vitter, J. S. (1985). Random sampling with a reservoir. ACM Trans. on Math. Software, 11(l):37–57.
Wilschut, Annita N. and Apers, Peter M. G. (1991). Dataflow query execution in a parallel main-memory environment. In Proceedings of PDIS Conference, pages 68–77.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer Science+Business Media, Inc.
About this chapter
Cite this chapter
Maier, D., Tucker, P.A., Garofalakis, M. (2005). Filtering, Punctuation, Windows and Synopses. In: Chaudhry, N.A., Shaw, K., Abdelguerfi, M. (eds) Stream Data Management. Advances in Database Systems, vol 30. Springer, Boston, MA. https://doi.org/10.1007/0-387-25229-0_3
Download citation
DOI: https://doi.org/10.1007/0-387-25229-0_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24393-1
Online ISBN: 978-0-387-25229-2
eBook Packages: Computer ScienceComputer Science (R0)