skip to main content
article

Issues in data stream management

Published:01 June 2003Publication History
Skip Abstract Section

Abstract

Traditional databases store sets of relatively static records with no pre-defined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require support for on-line analysis of rapidly changing data streams. Limitations of traditional DBMSs in supporting streaming applications have been recognized, prompting research to augment existing technologies and build new systems to manage streaming data. The purpose of this paper is to review recent work in data stream management systems, with an emphasis on application requirements, data models, continuous query languages, and query evaluation.

References

  1. N. Alon, Y. Matias, M. Szegedy. The Space Complexity of Approximating the Frequency Moments. In Proc. ACM Symp. on Theory of Computing, 1996, pp. 20--29.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Arasu, B. Babcock, S. Babu, J. McAlister, J. Widom. Characterizing Memory Requirements for Queries over Continuous Data Streams. In Proc. ACM Symp. on Principles of Database Systems, 2002, pp. 221--232.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Arasu, S. Babu, J. Widom. An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations. Technical Report, Nov. 2002. dbpubs.stanford.edu:8090/pub/2002-57.]]Google ScholarGoogle Scholar
  4. R. Avnur, J. Hellerstein. Eddies: Continuously Adaptive Query Processing. In Proc. ACM Int. Conf. on Management of Data, 2000, pp. 261--272.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Babcock, S. Babu, M. Datar, R. Motwani. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems. To appear in Proc. ACM Int. Conf. on Management of Data, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom. Models and Issues in Data Streams. In Proc. ACM Symp. on Principles of Database Systems, 2002, pp. 1--16.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Babcock, M. Datar, R. Motwani. Sampling from a Moving Window over Streaming Data. In Proc. SIAM-ACM Symp. on Discrete Algorithms, 2002, pp. 633--634.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Babcock, M. Datar, R. Motwani, L. O'Callaghan. Maintaining Variance and k-Medians over Data Stream Windows. To appear in Proc. ACM Symp. on Principles of Database Systems, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Babu, J. Widom. Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams. Technical Report, Nov. 2002. dbpubs.stanford.edu:8090/pub/2002-52.]]Google ScholarGoogle Scholar
  10. P. Bonnet, J. Gehrke, P. Seshadri. Towards Sensor Database Systems. In Proc. Int. Conf. on Mobile Data Management, 2001, pages 3--14.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Carney, U. Cetinternel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, S. Zdonik. Monitoring streams---A New Class of Data Management Applications. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 215--226.]]Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, M. Shah. TelegraphCQ: Continuous Data flow Processing for an Uncertain World. In Proc. Conf. on Innovative Data Syst. Res, 2003, pp. 269--280.]]Google ScholarGoogle Scholar
  13. S. Chandrasekaran, M. J. Franklin. Streaming Queries over Streaming Data. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 203--214.]]Google ScholarGoogle ScholarCross RefCross Ref
  14. S. Chandrasekaran, S. Krishnamurthy, S. Madden, A. Deshpande, M. J. Franklin, J. M. Hellerstein, M. Shah. Windows Explained, Windows Expressed. 2003. www.cs.berkeley.edu/~sirish/research/streaquel.pdf.]]Google ScholarGoogle Scholar
  15. M. Charikar, K. Chen, M. Farach-Colton. Finding frequent items in data streams. In Proc. Int. Colloquium on Automata, Languages and Programming, 2002, pp. 693--703.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Charikar, L. O'Callaghan, R. Panigrahy. Better Streaming Algorithms for Clustering Problems. To appear in Proc. ACM Symp. on Theory of Computing, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Chen, D. DeWitt, F. Tian, Y. Wang. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In Proc. ACM Int. Conf. on Management of Data, 2000, pp. 379--390.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Chen, G. Dong, J. Han, B. W. Wah, J. Wang. Multi-Dimensional Regression Analysis of Time-Series Data Streams. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 323--334.]]Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, S. Zdonik. Scalable Distributed Stream Processing. In Proc. Conf. on Innovative Data Syst. Res, 2003.]]Google ScholarGoogle Scholar
  20. G. Cormode, M. Datar, P. Indyk, S. Muthukrishnan. Comparing Data Streams Using Hamming Norms (How to Zero In). In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 335--345.]]Google ScholarGoogle ScholarCross RefCross Ref
  21. C. Cortes, K. Fisher, D. Pregibon, A. Rogers, F. Smith. Hancock: A Language for Extracting Signatures from Data Streams. In Proc. ACM Int. Conf. on Knowledge Discovery and Data Mining, 2000, pp. 9--17.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, O. Spatscheck. GigaScope: High Performance Network Monitoring with an SQL Interface. In Proc. ACM Int. Conf. on Management of Data, 2002, p. 623.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Datar, A. Gionis, P. Indyk, R. Motwani. Maintaining Stream Statistics over Sliding Windows. In Proc. SIAM-ACM Symp. on Discrete Algorithms, 2002, pp. 635--644]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. DeHaan, E. D. Demaine, L. Golab, A. Lopez-Ortiz, J. I. Munro. Towards Identifying Frequent Items in Sliding Windows. Technical Report, March 2003. db.uwaterloo.ca/~lgolab/frequent.pdf.]]Google ScholarGoogle Scholar
  25. E. Demaine, A. Lopez-Ortiz, J. I. Munro. Frequency Estimation of Internet Packet Streams with Limited Space. In Proc. European Symp. on Algorithms, 2002, pp. 348--360.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Dobra, M. Garofalakis, J. Gehrke, R. Rastogi. Processing Complex Aggregate Queries over Data Streams. In Proc. ACM Int. Conf. on Management of Data, 2002, pp. 61--72.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Estan, G. Varghese. New Directions in Traffic Measurement and Accounting. In Proc. ACM SIGCOMM Internet Measurement Workshop, 2001, pp. 75--80.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Faloutsos. Sensor Data Mining: Similarity Search and Pattern Analysis. Tutorial in Proc. Int. Conf. on Very Large Data Bases, 2002.]]Google ScholarGoogle Scholar
  29. J. Feigenbaum, S. Kannan, M. Strauss, M. Viswanathan. An Approximate L1-Difference Algorithm for Massive Data Streams. In Proc. Symp. on Foundations of Computer Science, 1999. pp. 501--511.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Flajolet, G. N. Martin. Probabilistic Counting. In Proc. Symp. on Foundations of Computer Science, 1983, pp. 76--82, 1983.]]Google ScholarGoogle Scholar
  31. M. Garofalakis, J. Gehrke, R. Rastogi. Querying and Mining Data Streams: You Only Get One Look. Tutorial in ACM Int. Conf. on Management of Data, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Garofalakis, P. Gibbons. Wavelet Synopses with Error Guarantees. In Proc. ACM Int. Conf. on Management of Data, 2002, pp. 476--487.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Gehrke, F. Korn, D. Srivastava. On Computing Correlated Aggregates Over Continual Data Streams. In Proc. ACM Int. Conf. on Management of Data, 2001, pp. 13--24.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Gibbons, S. Tirthapura. Estimating Simple Functions on the Union of Data Streams. In Proc. ACM Symp. on Parallel Algorithms an Architectures, 2001, pp. 281--291.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. Gibbons, S. Tirthapura. Distributed Streams Algorithms for Sliding Windows. In Proc. ACM Symp. on Parallel Algorithms and Architectures, 2002, pp. 63--72.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. J. Strauss. QuickSAND: Quick Summary and Analysis of Network Data. Technical Report, Dec. 2001. citeseer.nj.nec.com/gilbert01quicksand.html]]Google ScholarGoogle Scholar
  37. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. J. Strauss. Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries. In Proc. Int. Conf. on Very Large Data Bases, 2001, pp. 79--88.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. L. Golab, M. T. Özsu. Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams. Technical Report, Feb. 2003. db.uwaterloo.ca/~ddbms/publications/stream/multijoins.pdf.]]Google ScholarGoogle ScholarCross RefCross Ref
  39. L. Golab, M. T. Özsu. Data Stream Management Issues --- A Survey. Technical Report, Apr. 2003. db.uwaterloo.ca/~ddbms/publications/stream/streamsurvey.pdf.]]Google ScholarGoogle Scholar
  40. J. Greenwald, F. Khanna. Space Efficient On-Line Computation of Quantile Summaries. In Proc. ACM Int. Conf. on Management of Data, 2001, pp. 58--66.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Guha, P. Indyk, S. Muthukrishnan, M. Strauss. Histogramming Data Streams with Fast Per-Item Processing. In Proc. Int. Colloquium on Automata, Languages and Programming, 2002, pp. 681--692.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Guha, N. Mishra, R. Motwani, L. O'Callaghan. Clustering Data Streams. In Proc. IEEE Symp. on Foundations of Computer Science, pp. 359--366.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. A. Hammad, M. J. Franklin, W. G. Aref, A. K. Elmagarmid. Scheduling for shared window joins over data streams. Submitted for publication, Feb. 2003.]]Google ScholarGoogle Scholar
  44. G. Hulten, L. Spencer, P. Domingos. Mining Time-Changing Data Streams. In Proc. ACM Int. Conf. on Knowledge Discovery and Data Mining, 2001, pp. 97--106.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Kang, J. Naughton, S. Viglas. Evaluating Window Joins over Unbounded Streams. To appear in Proc. Int. Conf. on Data Engineering, 2003.]]Google ScholarGoogle Scholar
  46. F. Korn, S. Muthukrishnan, D. Srivastava. Reverse Nearest Neighbor Aggregates over Data Streams. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 814--825.]]Google ScholarGoogle ScholarCross RefCross Ref
  47. A. Lerner, D. Shasha. AQuery: Query Language for Ordered Data, Optimization Techniques, and Experiments. Technical Report, March 2003. csdocs.cs.nyu.edu/Dienst/Repository/2.0/Body/ncstrl.nyu_cs%2fTR2003-836/pdf.]]Google ScholarGoogle Scholar
  48. L. Liu, C. Pu, W. Tang. Continual Queries for Internet-Scale Event-Driven Information Delivery. In IEEE Trans. Knowledge and Data Eng., 11(4): 610--628, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. Madden, M. J. Franklin. Fjording the Stream: An Architecture for Queries Over Streaming Sensor Data. In Proc. Int. Conf. on Data Engineering, 2002, pp. 555--566.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. Madden, M. J. Franklin, J. M. Hellerstein, W. Hong. TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks. In Proc. Symp. on Operating Systems Design and Implementation, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. S. Madden, M. J. Franklin, J. M. Hellerstein, W. Hong. The Design of an Acquisitional Query Processor For Sensor Networks. To appear in Proc. ACM Int. Conf. on Management of Data, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. S. Madden, M. Shah, J. Hellerstein, V. Raman. Continuously Adaptive Continuous Queries Over Streams. In Proc. ACM Int. Conf. on Management of Data, 2002, pp. 49--60.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. S. Madden, R. Szewczyk, M. J. Franklin, D. Culler. Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks. In Proc. IEEE Workshop on Mobile Computing Systems and Applications, 2002, pp. 49--58.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. G. S. Manku, R. Motwani. Approximate Frequency Counts over Data Streams. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 346--357.]]Google ScholarGoogle ScholarCross RefCross Ref
  55. G.S. Manku, S. Rajagopalan, B.G. Lindsay. Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets. In Proc. ACM Int. Conf. on Management of Data, 1999, pp. 251--262.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosen-stein, R. Varma. Query Processing, Approximation, and Resource Management in a Data Stream Management System. In Proc. Conf. on Innovative Data Syst. Res, 2003, pp. 245--256.]]Google ScholarGoogle Scholar
  57. C. Olston, J. Jiang, J. Widom. Adaptive Filters for Continuous Queries over Distributed Data Streams. To appear in Proc. ACM Int. Conf. on Management of Data, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. V. Raman, A. Deshpande, J. Hellerstein. Using State Modules for Adaptive Query Processing. To appear in Proc. Int. Conf. on Data Engineering, 2003.]]Google ScholarGoogle Scholar
  59. M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, M. J. Franklin. Flux: An Adaptive Partitioning Operator for Continuous Query Systems. To appear in Proc. Int. Conf. on Data Engineering, 2003.]]Google ScholarGoogle Scholar
  60. Stream Query Repository, www-db.stanford.edu/stream/sqr.]]Google ScholarGoogle Scholar
  61. M. Sullivan, A. Heybey. Tribeca: A System for Managing Large Databases of Network Trafic. In Proc. USENIX Annual Technical Conf., 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. N. Tatbul, U. Cetintemel, S. Zdonik, M. Cherniack, M. Stonebraker. Load Shedding in a Data Stream Manager. Technical Report, Feb. 2003. www.cs.brown.edu/~tatbul/papers tatbul_tr.pdf.]]Google ScholarGoogle ScholarCross RefCross Ref
  63. Traderbot, www.traderbot.com.]]Google ScholarGoogle Scholar
  64. P. Tucker, D. Maier, T. Sheard, L. Fegaras. Enhancing relational operators for querying over punctuated data streams. 2002. www.cse.ogi.edu/dot/niagara/pstream/punctuating.pdf.]]Google ScholarGoogle Scholar
  65. P. Tucker, T. Tufte, V. Papadimos, D. Maier. NEXMark---a Benchmark for Querying Data Streams. 2002. www.cse.ogi.edu/dot/niagara/pstream/nexmark.pdf.]]Google ScholarGoogle Scholar
  66. T. Urhan, M. J. Franklin. XJoin: A Reactively-Scheduled Pipelined Join Operator. In IEEE Data Engineering Bulletin, 23(2):27--33, June 2000.]]Google ScholarGoogle Scholar
  67. S. Viglas and J. Naughton. Rate-Based Query Optimization for Streaming Information Sources. In Proc. ACM Int. Conf. on Management of Data, 2002, pp. 37--48.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. H. Wang, C. Zaniolo. ATLaS: A Native Extension of SQL for Data Mining and Stream Computations. citeseer.nj.nec.com/551711.html.]]Google ScholarGoogle Scholar
  69. A. Wilschut, P. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. Int. Conf. Parallel and Distributed Information Systems, 1991, pp. 68--77.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Y. Yao and J. Gehrke. Query Processing for Sensor Networks. In Proc. Conf. on Innovative Data Syst. Res, 2003, pp. 233--244.]]Google ScholarGoogle Scholar
  71. B.-K. Yi, N. Sidiropoulos, T. Johnson, H. V. Jagadish, C. Faloutsos, A. Biliris. On-Line Data Mining for Co-Evolving Time Sequences. In Proc. Int. Conf. on Data Engineering, 2000, pp. 13--22.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Y. Zhu, D. Shasha. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 358--369.]]Google ScholarGoogle Scholar

Index Terms

  1. Issues in data stream management
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGMOD Record
          ACM SIGMOD Record  Volume 32, Issue 2
          June 2003
          87 pages
          ISSN:0163-5808
          DOI:10.1145/776985
          Issue’s Table of Contents

          Copyright © 2003 Authors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 June 2003

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader