Abstract
Traditional databases store sets of relatively static records with no pre-defined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require support for on-line analysis of rapidly changing data streams. Limitations of traditional DBMSs in supporting streaming applications have been recognized, prompting research to augment existing technologies and build new systems to manage streaming data. The purpose of this paper is to review recent work in data stream management systems, with an emphasis on application requirements, data models, continuous query languages, and query evaluation.
- N. Alon, Y. Matias, M. Szegedy. The Space Complexity of Approximating the Frequency Moments. In Proc. ACM Symp. on Theory of Computing, 1996, pp. 20--29.]] Google ScholarDigital Library
- A. Arasu, B. Babcock, S. Babu, J. McAlister, J. Widom. Characterizing Memory Requirements for Queries over Continuous Data Streams. In Proc. ACM Symp. on Principles of Database Systems, 2002, pp. 221--232.]] Google ScholarDigital Library
- A. Arasu, S. Babu, J. Widom. An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations. Technical Report, Nov. 2002. dbpubs.stanford.edu:8090/pub/2002-57.]]Google Scholar
- R. Avnur, J. Hellerstein. Eddies: Continuously Adaptive Query Processing. In Proc. ACM Int. Conf. on Management of Data, 2000, pp. 261--272.]] Google ScholarDigital Library
- B. Babcock, S. Babu, M. Datar, R. Motwani. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems. To appear in Proc. ACM Int. Conf. on Management of Data, June 2003.]] Google ScholarDigital Library
- B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom. Models and Issues in Data Streams. In Proc. ACM Symp. on Principles of Database Systems, 2002, pp. 1--16.]] Google ScholarDigital Library
- B. Babcock, M. Datar, R. Motwani. Sampling from a Moving Window over Streaming Data. In Proc. SIAM-ACM Symp. on Discrete Algorithms, 2002, pp. 633--634.]] Google ScholarDigital Library
- B. Babcock, M. Datar, R. Motwani, L. O'Callaghan. Maintaining Variance and k-Medians over Data Stream Windows. To appear in Proc. ACM Symp. on Principles of Database Systems, June 2003.]] Google ScholarDigital Library
- S. Babu, J. Widom. Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams. Technical Report, Nov. 2002. dbpubs.stanford.edu:8090/pub/2002-52.]]Google Scholar
- P. Bonnet, J. Gehrke, P. Seshadri. Towards Sensor Database Systems. In Proc. Int. Conf. on Mobile Data Management, 2001, pages 3--14.]] Google ScholarDigital Library
- D. Carney, U. Cetinternel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, S. Zdonik. Monitoring streams---A New Class of Data Management Applications. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 215--226.]]Google ScholarCross Ref
- S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, M. Shah. TelegraphCQ: Continuous Data flow Processing for an Uncertain World. In Proc. Conf. on Innovative Data Syst. Res, 2003, pp. 269--280.]]Google Scholar
- S. Chandrasekaran, M. J. Franklin. Streaming Queries over Streaming Data. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 203--214.]]Google ScholarCross Ref
- S. Chandrasekaran, S. Krishnamurthy, S. Madden, A. Deshpande, M. J. Franklin, J. M. Hellerstein, M. Shah. Windows Explained, Windows Expressed. 2003. www.cs.berkeley.edu/~sirish/research/streaquel.pdf.]]Google Scholar
- M. Charikar, K. Chen, M. Farach-Colton. Finding frequent items in data streams. In Proc. Int. Colloquium on Automata, Languages and Programming, 2002, pp. 693--703.]] Google ScholarDigital Library
- M. Charikar, L. O'Callaghan, R. Panigrahy. Better Streaming Algorithms for Clustering Problems. To appear in Proc. ACM Symp. on Theory of Computing, June 2003.]] Google ScholarDigital Library
- J. Chen, D. DeWitt, F. Tian, Y. Wang. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In Proc. ACM Int. Conf. on Management of Data, 2000, pp. 379--390.]] Google ScholarDigital Library
- Y. Chen, G. Dong, J. Han, B. W. Wah, J. Wang. Multi-Dimensional Regression Analysis of Time-Series Data Streams. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 323--334.]]Google ScholarCross Ref
- M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, S. Zdonik. Scalable Distributed Stream Processing. In Proc. Conf. on Innovative Data Syst. Res, 2003.]]Google Scholar
- G. Cormode, M. Datar, P. Indyk, S. Muthukrishnan. Comparing Data Streams Using Hamming Norms (How to Zero In). In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 335--345.]]Google ScholarCross Ref
- C. Cortes, K. Fisher, D. Pregibon, A. Rogers, F. Smith. Hancock: A Language for Extracting Signatures from Data Streams. In Proc. ACM Int. Conf. on Knowledge Discovery and Data Mining, 2000, pp. 9--17.]] Google ScholarDigital Library
- C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, O. Spatscheck. GigaScope: High Performance Network Monitoring with an SQL Interface. In Proc. ACM Int. Conf. on Management of Data, 2002, p. 623.]] Google ScholarDigital Library
- M. Datar, A. Gionis, P. Indyk, R. Motwani. Maintaining Stream Statistics over Sliding Windows. In Proc. SIAM-ACM Symp. on Discrete Algorithms, 2002, pp. 635--644]] Google ScholarDigital Library
- D. DeHaan, E. D. Demaine, L. Golab, A. Lopez-Ortiz, J. I. Munro. Towards Identifying Frequent Items in Sliding Windows. Technical Report, March 2003. db.uwaterloo.ca/~lgolab/frequent.pdf.]]Google Scholar
- E. Demaine, A. Lopez-Ortiz, J. I. Munro. Frequency Estimation of Internet Packet Streams with Limited Space. In Proc. European Symp. on Algorithms, 2002, pp. 348--360.]] Google ScholarDigital Library
- A. Dobra, M. Garofalakis, J. Gehrke, R. Rastogi. Processing Complex Aggregate Queries over Data Streams. In Proc. ACM Int. Conf. on Management of Data, 2002, pp. 61--72.]] Google ScholarDigital Library
- C. Estan, G. Varghese. New Directions in Traffic Measurement and Accounting. In Proc. ACM SIGCOMM Internet Measurement Workshop, 2001, pp. 75--80.]] Google ScholarDigital Library
- C. Faloutsos. Sensor Data Mining: Similarity Search and Pattern Analysis. Tutorial in Proc. Int. Conf. on Very Large Data Bases, 2002.]]Google Scholar
- J. Feigenbaum, S. Kannan, M. Strauss, M. Viswanathan. An Approximate L1-Difference Algorithm for Massive Data Streams. In Proc. Symp. on Foundations of Computer Science, 1999. pp. 501--511.]] Google ScholarDigital Library
- P. Flajolet, G. N. Martin. Probabilistic Counting. In Proc. Symp. on Foundations of Computer Science, 1983, pp. 76--82, 1983.]]Google Scholar
- M. Garofalakis, J. Gehrke, R. Rastogi. Querying and Mining Data Streams: You Only Get One Look. Tutorial in ACM Int. Conf. on Management of Data, 2002.]] Google ScholarDigital Library
- M. Garofalakis, P. Gibbons. Wavelet Synopses with Error Guarantees. In Proc. ACM Int. Conf. on Management of Data, 2002, pp. 476--487.]] Google ScholarDigital Library
- J. Gehrke, F. Korn, D. Srivastava. On Computing Correlated Aggregates Over Continual Data Streams. In Proc. ACM Int. Conf. on Management of Data, 2001, pp. 13--24.]] Google ScholarDigital Library
- P. Gibbons, S. Tirthapura. Estimating Simple Functions on the Union of Data Streams. In Proc. ACM Symp. on Parallel Algorithms an Architectures, 2001, pp. 281--291.]] Google ScholarDigital Library
- P. Gibbons, S. Tirthapura. Distributed Streams Algorithms for Sliding Windows. In Proc. ACM Symp. on Parallel Algorithms and Architectures, 2002, pp. 63--72.]] Google ScholarDigital Library
- A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. J. Strauss. QuickSAND: Quick Summary and Analysis of Network Data. Technical Report, Dec. 2001. citeseer.nj.nec.com/gilbert01quicksand.html]]Google Scholar
- A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. J. Strauss. Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries. In Proc. Int. Conf. on Very Large Data Bases, 2001, pp. 79--88.]] Google ScholarDigital Library
- L. Golab, M. T. Özsu. Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams. Technical Report, Feb. 2003. db.uwaterloo.ca/~ddbms/publications/stream/multijoins.pdf.]]Google ScholarCross Ref
- L. Golab, M. T. Özsu. Data Stream Management Issues --- A Survey. Technical Report, Apr. 2003. db.uwaterloo.ca/~ddbms/publications/stream/streamsurvey.pdf.]]Google Scholar
- J. Greenwald, F. Khanna. Space Efficient On-Line Computation of Quantile Summaries. In Proc. ACM Int. Conf. on Management of Data, 2001, pp. 58--66.]] Google ScholarDigital Library
- S. Guha, P. Indyk, S. Muthukrishnan, M. Strauss. Histogramming Data Streams with Fast Per-Item Processing. In Proc. Int. Colloquium on Automata, Languages and Programming, 2002, pp. 681--692.]] Google ScholarDigital Library
- S. Guha, N. Mishra, R. Motwani, L. O'Callaghan. Clustering Data Streams. In Proc. IEEE Symp. on Foundations of Computer Science, pp. 359--366.]] Google ScholarDigital Library
- M. A. Hammad, M. J. Franklin, W. G. Aref, A. K. Elmagarmid. Scheduling for shared window joins over data streams. Submitted for publication, Feb. 2003.]]Google Scholar
- G. Hulten, L. Spencer, P. Domingos. Mining Time-Changing Data Streams. In Proc. ACM Int. Conf. on Knowledge Discovery and Data Mining, 2001, pp. 97--106.]] Google ScholarDigital Library
- J. Kang, J. Naughton, S. Viglas. Evaluating Window Joins over Unbounded Streams. To appear in Proc. Int. Conf. on Data Engineering, 2003.]]Google Scholar
- F. Korn, S. Muthukrishnan, D. Srivastava. Reverse Nearest Neighbor Aggregates over Data Streams. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 814--825.]]Google ScholarCross Ref
- A. Lerner, D. Shasha. AQuery: Query Language for Ordered Data, Optimization Techniques, and Experiments. Technical Report, March 2003. csdocs.cs.nyu.edu/Dienst/Repository/2.0/Body/ncstrl.nyu_cs%2fTR2003-836/pdf.]]Google Scholar
- L. Liu, C. Pu, W. Tang. Continual Queries for Internet-Scale Event-Driven Information Delivery. In IEEE Trans. Knowledge and Data Eng., 11(4): 610--628, 1999.]] Google ScholarDigital Library
- S. Madden, M. J. Franklin. Fjording the Stream: An Architecture for Queries Over Streaming Sensor Data. In Proc. Int. Conf. on Data Engineering, 2002, pp. 555--566.]] Google ScholarDigital Library
- S. Madden, M. J. Franklin, J. M. Hellerstein, W. Hong. TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks. In Proc. Symp. on Operating Systems Design and Implementation, 2002.]] Google ScholarDigital Library
- S. Madden, M. J. Franklin, J. M. Hellerstein, W. Hong. The Design of an Acquisitional Query Processor For Sensor Networks. To appear in Proc. ACM Int. Conf. on Management of Data, June 2003.]] Google ScholarDigital Library
- S. Madden, M. Shah, J. Hellerstein, V. Raman. Continuously Adaptive Continuous Queries Over Streams. In Proc. ACM Int. Conf. on Management of Data, 2002, pp. 49--60.]] Google ScholarDigital Library
- S. Madden, R. Szewczyk, M. J. Franklin, D. Culler. Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks. In Proc. IEEE Workshop on Mobile Computing Systems and Applications, 2002, pp. 49--58.]] Google ScholarDigital Library
- G. S. Manku, R. Motwani. Approximate Frequency Counts over Data Streams. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 346--357.]]Google ScholarCross Ref
- G.S. Manku, S. Rajagopalan, B.G. Lindsay. Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets. In Proc. ACM Int. Conf. on Management of Data, 1999, pp. 251--262.]] Google ScholarDigital Library
- R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosen-stein, R. Varma. Query Processing, Approximation, and Resource Management in a Data Stream Management System. In Proc. Conf. on Innovative Data Syst. Res, 2003, pp. 245--256.]]Google Scholar
- C. Olston, J. Jiang, J. Widom. Adaptive Filters for Continuous Queries over Distributed Data Streams. To appear in Proc. ACM Int. Conf. on Management of Data, June 2003.]] Google ScholarDigital Library
- V. Raman, A. Deshpande, J. Hellerstein. Using State Modules for Adaptive Query Processing. To appear in Proc. Int. Conf. on Data Engineering, 2003.]]Google Scholar
- M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, M. J. Franklin. Flux: An Adaptive Partitioning Operator for Continuous Query Systems. To appear in Proc. Int. Conf. on Data Engineering, 2003.]]Google Scholar
- Stream Query Repository, www-db.stanford.edu/stream/sqr.]]Google Scholar
- M. Sullivan, A. Heybey. Tribeca: A System for Managing Large Databases of Network Trafic. In Proc. USENIX Annual Technical Conf., 1998.]] Google ScholarDigital Library
- N. Tatbul, U. Cetintemel, S. Zdonik, M. Cherniack, M. Stonebraker. Load Shedding in a Data Stream Manager. Technical Report, Feb. 2003. www.cs.brown.edu/~tatbul/papers tatbul_tr.pdf.]]Google ScholarCross Ref
- Traderbot, www.traderbot.com.]]Google Scholar
- P. Tucker, D. Maier, T. Sheard, L. Fegaras. Enhancing relational operators for querying over punctuated data streams. 2002. www.cse.ogi.edu/dot/niagara/pstream/punctuating.pdf.]]Google Scholar
- P. Tucker, T. Tufte, V. Papadimos, D. Maier. NEXMark---a Benchmark for Querying Data Streams. 2002. www.cse.ogi.edu/dot/niagara/pstream/nexmark.pdf.]]Google Scholar
- T. Urhan, M. J. Franklin. XJoin: A Reactively-Scheduled Pipelined Join Operator. In IEEE Data Engineering Bulletin, 23(2):27--33, June 2000.]]Google Scholar
- S. Viglas and J. Naughton. Rate-Based Query Optimization for Streaming Information Sources. In Proc. ACM Int. Conf. on Management of Data, 2002, pp. 37--48.]] Google ScholarDigital Library
- H. Wang, C. Zaniolo. ATLaS: A Native Extension of SQL for Data Mining and Stream Computations. citeseer.nj.nec.com/551711.html.]]Google Scholar
- A. Wilschut, P. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. Int. Conf. Parallel and Distributed Information Systems, 1991, pp. 68--77.]] Google ScholarDigital Library
- Y. Yao and J. Gehrke. Query Processing for Sensor Networks. In Proc. Conf. on Innovative Data Syst. Res, 2003, pp. 233--244.]]Google Scholar
- B.-K. Yi, N. Sidiropoulos, T. Johnson, H. V. Jagadish, C. Faloutsos, A. Biliris. On-Line Data Mining for Co-Evolving Time Sequences. In Proc. Int. Conf. on Data Engineering, 2000, pp. 13--22.]] Google ScholarDigital Library
- Y. Zhu, D. Shasha. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 358--369.]]Google Scholar
Index Terms
- Issues in data stream management
Recommendations
Models and issues in data stream systems
PODS '02: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsIn this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data ...
Real-Time Scheduling for Data Stream Management Systems
ECRTS '05: Proceedings of the 17th Euromicro Conference on Real-Time SystemsQuality-aware management of data streams is gaining moreand more importance with the amount of data produced by streams growing continuously. The resources required for data stream processing depend on different factors and are limited by the ...
Comments