skip to main content
10.1145/1376916.1376930acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Time-decaying aggregates in out-of-order streams

Published:09 June 2008Publication History

ABSTRACT

Processing large data streams is now a major topic in data management. The data involved can be truly massive, and the required analyses complex. In a stream of sequential events such as stock feeds, sensor readings, or IP traffic measurements, data tuples pertaining to recent events are typically more important than older ones. This can be formalized via time-decay functions, which assign weights to data based on the age of data. Decay functions such as sliding windows and exponential decay have been studied under the assumption of well-ordered arrivals, i.e., data arrives in non-decreasing order of time stamps. However, data quality issues are prevalent in massive streams (due to network asynchrony and delays etc.), and correct arrival order is not guaranteed.

We focus on the computation of decayed aggregates such as range queries, quantiles, and heavy hitters on out-of-order streams, where elements do not necessarily arrive in increasing order of timestamps. Existing techniques such as Exponential Histograms and Waves are unable to handle out-of-order streams. We give the first deterministic algorithms for approximating these aggregates under popular decay functions such as sliding window and polynomial decay. We study the overhead of allowing out-of-order arrivals when compared to well-ordered arrivals, both analytically and experimentally. Our experiments confirm that these algorithms can be applied in practice, and compare the relative performance of different approaches for handling out-of-order arrivals.

References

  1. D. Abadi et al. Aurora: a data stream management system. In SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. JCSS: Journal of Computer and System Sciences, 58:137--147, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Arasu and G. S. Manku. Approximate counts and quantiles over sliding windows. In PODS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Babcock, M. Datar, R. Motwani, and L. O'Callaghan. Maintaining variance and k-medians over data stream windows. In PODS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. Braverman and R. Ostrovsky Smooth Histograms for Sliding Windows. In FOCS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Busch and S. Tirthapura. A deterministic algorithm for summarizing asynchronous streams over a sliding window. In STACS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Cohen. User-defined aggregate functions: bridging theory and practice. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Cohen and M. Strauss. Maintaining time-decaying stream aggregates. In PODS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Cormode, F. Korn, S. Muthukrishnan, T. Johnson, O. Spatscheck, and D. Srivastava. Holistic UDAFs at streaming speeds. In SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava. Spaceand time-efficient deterministic algorithms for biased quantiles over data streams. In PODS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Cormode, F. Korn, and S. Tirthapura. Exponentially Decayed Aggregates on Data Streams. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Cormode and S. Muthukrishnan. Space efficient mining of multigraph streams. In PODS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In SODA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Gibbons and S. Tirthapura. Distributed streams algorithms for sliding windows. Theory of Computing Systems, 37:457--478, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Hershberger, N. Shrivastava, S. Suri, and C. Toth. Adaptive spatial partitioning for multidimensional data streams. In ISAAC, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Kopelowitz and E. Porat. Improved Algorithms for Polynomial Time-Decay and Time-Decay with Additive error. In ICTCS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L.K. Lee and H.F. Ting. A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In PODS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. Finding (recently) frequent items in distributed data streams. In ICDE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Misra and D. Gries. Finding repeated elements. Science of Computer Programming, 2:143--152, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  22. S. Muthukrishnan. Data streams: Algorithms and applications. In SODA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. I. Munro and M. Paterson. Selection and sorting with limited storage. Theor. Comput. Sci., 12:315--323, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  24. L. Qiao, D. Agrawal, and A. El Abbadi. Supporting sliding window queries for continuous data streams. In SSDBM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Shrivastava, C. Buragohain, D. Agrawal, and S. Suri. Medians and beyond: New aggregation techniques for sensor networks. In ACM SenSys, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Tirthapura, C. Busch, and B. Xu. Sketching asycnhronous streams over sliding windows. In PODC, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. A. Tucker, D. Maier, T. Sheard, and L. Fegaras. Exploiting punctuation semantics in countinuous data streams. IEEE TKDE, 15(3):555--568, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Time-decaying aggregates in out-of-order streams

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
              June 2008
              330 pages
              ISBN:9781605581521
              DOI:10.1145/1376916

              Copyright © 2008 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 June 2008

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              PODS '08 Paper Acceptance Rate28of159submissions,18%Overall Acceptance Rate642of2,707submissions,24%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader