Abstract
We present some results related to small space computation over sliding windows in the data-stream model. Most research in the data-stream model, including results presented in some of the other chapters, assume that all data elements seen so far in the stream are equally important and synopses, statistics or models that are built should reflect the entire data set. However, for many applications this assumption is not true, particularly those that ascribe more importance to recent data items. One way to discount old data items and only consider recent ones for analysis is the sliding-window model: Data elements arrive at every instant; each data element expires after exactly N time steps; and, the portion of data that is relevant to gathering statistics or answering queries is the set of last N elements to arrive. The sliding window refers to the window of active data elements at a given time instant and window size refers to N. This chapter presents a general technique, called the Exponential Histogram (EH) technique, that can be used to solve a wide variety of problems in the sliding-window model; typically problems that require us to maintain statistics. We will showcase this technique through solutions to basic counting problems, as well as other applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
N. Alon, Y. Matias, M. Szegedy, The space complexity of approximating the frequency moments, in Proc. of the 1996 Annual ACM Symp. on Theory of Computing (1996), pp. 20–29
A. Arasu, G. Manku, Approximate counts and quantiles over sliding windows. Technical report, Stanford University, Stanford, California (2004)
B. Babcock, M. Datar, R. Motwani, Sampling from a moving window over streaming data, in Proc. of the 2002 Annual ACM-SIAM Symp. on Discrete Algorithms (2002), pp. 633–634
B. Babcock, M. Datar, R. Motwani, L. O’Callaghan, Maintaining variance and k-medians over data stream windows, in Proc. of the 2003 ACM Symp. on Principles of Database Systems (2003), pp. 234–243
E. Cohen, M. Strauss, Maintaining time-decaying stream aggregates, in Proc. of the 2003 ACM Symp. on Principles of Database Systems (2003), pp. 223–233
A. Das, J. Gehrke, M. Riedwald, Approximate join processing over data streams, in Proc. of the 2003 ACM SIGMOD Intl. Conf. on Management of Data (2003), pp. 40–51
M. Datar, Algorithms for data stream systems. PhD thesis, Stanford University, Stanford, CA, USA (2003)
M. Datar, A. Gionis, P. Indyk, R. Motwani, Maintaining stream statistics over sliding windows. SIAM J. Comput. 31(6), 1794–1813 (2002)
M. Datar, S. Muthukrishnan, Estimating rarity and similarity over data stream windows, in Proc. of the 2002 Annual European Symp. on Algorithms (2002), pp. 323–334
J. Feigenbaum, S. Kannan, M. Strauss, M. Viswanathan, An approximate \(l_{1}\)-difference algorithm for massive data streams, in Proc. of the 1999 Annual IEEE Symp. on Foundations of Computer Science (1999), pp. 501–511
A. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, M. Strauss, Fast, small-space algorithms for approximate histogram maintenance, in Proc. of the 2002 Annual ACM Symp. on Theory of Computing (2002)
A. Gilbert, Y. Kotidis, S. Muthukrishnan, M. Strauss, Surfing wavelets on streams: one-pass summaries for approximate aggregate queries, in Proc. of the 2001 Intl. Conf. on Very Large Data Bases (2001), pp. 79–88
M. Greenwald, S. Khanna, Space-efficient online computation of quantile summaries, in Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data (2001), pp. 58–66
S. Guha, N. Mishra, R. Motwani, L. O’Callaghan, Clustering data streams, in Proc. of the 2000 Annual IEEE Symp. on Foundations of Computer Science (2000), pp. 359–366
P. Indyk, Stable distributions, pseudorandom generators, embeddings and data stream computation, in Proc. of the 2000 Annual IEEE Symp. on Foundations of Computer Science (2000), pp. 189–197
J. Kang, J.F. Naughton, S. Viglas, Evaluating window joins over unbounded streams, in Proc. of the 2003 Intl. Conf. on Data Engineering (2003)
X. Lin, H. Lu, J. Xu, J.X. Yu, Continuously maintaining quantile summaries of the most recent \(n\) elements over a data stream, in Proc. of the 2004 Intl. Conf. on Data Engineering (2004)
R. Motwani, P. Raghavan, Randomized Algorithms (Cambridge University Press, Cambridge, 1995)
J.S. Vitter, Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Datar, M., Motwani, R. (2016). The Sliding-Window Computation Model and Results. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds) Data Stream Management. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28608-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-28608-0_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28607-3
Online ISBN: 978-3-540-28608-0
eBook Packages: Computer ScienceComputer Science (R0)