Abstract
Massive data sets often arise as physically distributed, parallel data streams, and it is important to estimate various aggregates and statistics on the union of these streams. This paper presents algorithms for estimating aggregate functions over a “sliding window” of the N most recent data items in one or more streams. Our results include: 1. For a single stream,we present the first ε-approximation scheme for the number of 1’s in a sliding window that is optimal in both worst case time and space. We also present the first ε-approximation scheme for the sum of integers in [0..R] in a sliding window that is optimal in both worst case time and space (assuming R is at most polynomial in N). Both algorithms are deterministic and use only logarithmic memory words. 2. In contrast, we show that any deterministic algorithm that estimates, to within a small constant relative error, the number of 1’s (or the sum of integers) in a sliding window on the union of distributed streams requires Ω(N) space. 3. We present the first (randomized) (ε, δ)-approximation scheme for the number of 1’s in a sliding window on the union of distributed streams that uses only logarithmic memory words. We also present the first (ε, δ)-approximation scheme for the number of distinct values in a sliding window on distributed streams that uses only logarithmic memory words. Our results are obtained using a novel family of synopsis data structures called waves.
Similar content being viewed by others
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Gibbons, P., Tirthapura, S. Distributed Streams Algorithms for Sliding Windows. Theory Comput Syst 37, 457–478 (2004). https://doi.org/10.1007/s00224-004-1156-4
Issue Date:
DOI: https://doi.org/10.1007/s00224-004-1156-4