Skip to main content
Log in

Sketching asynchronous data streams over sliding windows

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

We study the problem of maintaining a sketch of recent elements of a data stream. Motivated by applications involving network data, we consider streams that are asynchronous, in which the observed order of data is not the same as the time order in which the data was generated. The notion of recent elements of a stream is modeled by the sliding timestamp window, which is the set of elements with timestamps that are close to the current time. We design algorithms for maintaining sketches of all elements within the sliding timestamp window that can give provably accurate estimates of two basic aggregates, the sum and the median, of a stream of numbers. The space taken by the sketches, the time needed for querying the sketch, and the time for inserting new elements into the sketch are all polylogarithmic with respect to the maximum window size. Our sketches can be easily combined in a lossless and compact way, making them useful for distributed computations over data streams. Previous works on sketching recent elements of a data stream have all considered the more restrictive scenario of synchronous streams, where the observed order of data is the same as the time order in which the data was generated. Our notion of recency of elements is more general than that studied in previous work, and thus our sketches are more robust to network delays and asynchrony.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Arasu, A., Manku, G.: Approximate counts and quantiles over sliding windows. In: Proceedings of ACM Symposium on Principles of Database Systems (PODS), pp. 286–296 (2004)

  2. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of 21st ACM Symposium on Principles of Database Systems (PODS), pp. 1–16 (2002)

  3. Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining variance and k-medians over data stream windows. In: Proceedings of 22nd ACM Symposium on Principles of Database Systems (PODS), pp. 234–243 (2003)

  4. Bloom B. (1970). Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7): 422–426

    Article  MATH  Google Scholar 

  5. Busch, C., Tirthapura, S.: A deterministic algorithm for summarizing asynchronous streams over a sliding window. In: Proceedings of International Symposium on Theoretical Aspects of Computer Science (STACS), pp. 465–476 (2007)

  6. Datar M., Gionis A., Indyk P. and Motwani R. (2002). Maintaining stream statistics over sliding windows. SIAM J. Comput. 31(6): 1794–1813

    Article  MATH  MathSciNet  Google Scholar 

  7. Feigenbaum J., Kannan S. and Zhang J. (2005). Computing diameter in the streaming and sliding-window models. Algorithmica 41: 25–41

    Article  MATH  MathSciNet  Google Scholar 

  8. Gibbons, P., Tirthapura, S.: Estimating simple functions on the union of data streams. In: Proceedings of ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 281–291 (2001)

  9. Gibbons P. and Tirthapura S. (2004). Distributed streams algorithms for sliding windows. Theory Comput. Syst. 37: 457–478

    Article  MATH  MathSciNet  Google Scholar 

  10. Greenwald, M., Khanna, S.: Space efficient online computation of quantile summaries. In: Proceedings of ACM International Conference on Management of Data (SIGMOD), pp. 58–66 (2001)

  11. Guha, S., Gunopulos, D., Koudas, N.: Correlating synchronous and asynchronous data streams. In: Proceedings of 9th ACM International Conference on Knowledge Discovery and Data Mining (KDD), pp. 529–534 (2003)

  12. Madden S., Franklin M., Hellerstein J. and Hong W. (2002). Tag: a tiny aggregation service for ad-hoc sensor networks. SIGOPS Oper. Syst. Rev. 36(SI): 131–146

    Article  Google Scholar 

  13. Manjhi, A., Shkapenyuk, V., Dhamdhere, K., Olston, C.: Finding (recently) frequent items in distributed data streams. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 767–778 (2005)

  14. Manku, G., Rajagopalan, S., Lindsley, B.: Approximate medians and other quantiles in one pass and with limited memory. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 426–435 (1998)

  15. Muthukrishnan, S.: Data streams: algorithms and applications. technical report. Rutgers University, Piscataway (2003)

  16. Patt-Shamir, B.: A note on efficient aggregate queries in sensor networks. In: Proceedings of the 23rd Annual ACM Symposium on Principles of Distributed Computing (PODC), pp. 283–289 (2004)

  17. Schmidt J., Siegel A. and Srinivasan A. (1995). Chernoff-hoeffding bounds for applications with limited independence. SIAM J. Discrete Math. 8(2): 223–250

    Article  MATH  MathSciNet  Google Scholar 

  18. Srivastava, U., Widom, J.: Flexible time management in data stream systems. In: Proceedings of 23rd ACM Symposium on Principles of Database Systems (PODS), pp. 263–274 (2004)

  19. Tirthapura, S., Xu, B., Busch, C.: Sketching asynchronous streams over a sliding window. In: Proceedings of the 25th Annual ACM Symposium on Principles of Distributed domputing(PODC), pp. 82–91 (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bojian Xu.

Additional information

The work of the authors was supported in part through NSF grants CNS 0520102 and CNS 0520009.

A preliminary version of this paper appeared in Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC) 2006, pages 82–91.

Work done while the third author was at Rensselaer Polytechnic Institute.

Authors are listed in reverse alphabetical order.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, B., Tirthapura, S. & Busch, C. Sketching asynchronous data streams over sliding windows. Distrib. Comput. 20, 359–374 (2008). https://doi.org/10.1007/s00446-007-0048-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-007-0048-7

Keywords

Navigation