skip to main content
10.1145/1244002.1244108acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

A priority random sampling algorithm for time-based sliding windows over weighted streaming data

Authors Info & Claims
Published:11 March 2007Publication History

ABSTRACT

This paper introduces the problem of random sampling from time-based sliding windows over weighted streaming data and presents a priority random sampling (PRS) algorithm for this problem. The algorithm extends classic reservoir-sampling algorithm and weighted random sampling algorithm with a reservoir to deal with the expiration of data items from time-based sliding window, and can avoid drawbacks of classic reservoir-sampling algorithm and weighted sampling algorithm with a reservoir. In the new algorithm, a key is assigned for each data item in the time-based sliding window by compromising its weight and arrival time, and works even when the number of data items in a sliding window varies dynamically over time. The experiments show that PRS algorithm is somewhat superior to WRS algorithm.

References

  1. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. Proceeding of 21st ACM SIGACT-SIGMODSIGART Symp. on Principles of Database Systems, Madison, Wisconsin, pp. 1--16, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sirish Chandrasekaran and Michael J. Franklin. Streaming queries over streaming data. Proc. of the 28th Int'l Conf. on Very Large Data Bases (VLDB), Hong Kong, China, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Gibbons. Distinct sampling for highly-accurate answers to distinct values queries and event reports. Proc. of the 27th Int'l Conf. on Very Large Data Bases (VLDB), Roma, Italy, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. J. Abadi, D. Carney, U. Cetintemel, et al. Aurora: a new model and architecture for data stream management. The VLDB Journal (2003)/Digital Object Identifier (DOI) 10.1007/s00778-003-0095-z Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Zhu Y, Shasha D. Statstream: statistical monitoring of thousands of data streams in real time. Proc. of the 28th Int'l Conf. on Very Large Data Bases (VLDB), Hong Kong, China, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Vitter JS. Random sampling with a reservoir. ACM Trans. on Mathematical Software, 1985, 11(1): 37--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Manku and R. Motwani. Approximate frequency counts over data streams. Proc. of the 28th Int'l Conf. on Very Large Data Bases. Hong Kong, China, pp. 346--357, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Babcock B, Datar M, Motwani R. Sampling from a moving window over streaming data. Proc. of the 13th Annual ACM-SIAM Symp. on Discrete Algorithms. San Francisco: ACM/SIAM, pp. 633--634. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M Datar, A Gionis, P Indyk, et al. Maintaining stream statistics over sliding windows. Proc. of the 13th Annual ACM-SIAM Symp on Discrete Algorithms, San Francisco, California, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Greenwald and S. Khanna, Space-efficient online computation of quantile summaries, Proc. of SIGMOD 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Datar. Algorithms for data stream systems. Ph. D Thesis, Stanford University, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. S. Efraimidis, P. G. Spirakis. Weighted random sampling with a reservoir. Information Processing Letters, Volume 97, Issue 5, pp. 181--185, March 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Cranor, T. Johnson, O. Spatschnek, V. Shkapenyuk. Gogascope: a stream database for network applications. Proc. of ACM SIGMOD 2002, pp. 262, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Zhang L, Li Z, Yu M, et al. Random sampling algorithms for sliding windows over data streams. Proc. of the 11th Joint International Computer Conference (JICC 2005). Chongqing, China, pp. 572--575. 2005.Google ScholarGoogle ScholarCross RefCross Ref
  15. T. Johnson, S. Muthukrishnan, I. Rozenbaum. Sampling algorithms in a stream operator. SIGMOD Record 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Domingos, G. Hulten. A general framework for mining massive data streams. Journal of Computational & Graphical Statistics, Vol. 12, No. 4, pp.945--949. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  17. http://en.wikipedia.org/wiki/Zipf's_law.Google ScholarGoogle Scholar
  18. http://www.nslij-genetics.org/wli/zipf/Google ScholarGoogle Scholar

Index Terms

  1. A priority random sampling algorithm for time-based sliding windows over weighted streaming data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SAC '07: Proceedings of the 2007 ACM symposium on Applied computing
      March 2007
      1688 pages
      ISBN:1595934804
      DOI:10.1145/1244002

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 March 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,650of6,669submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader