Abstract
Continuously monitoring large-scale aggregates over data streams is important for many stream processing applications, e.g. collaborative intelligence analysis, and presents new challenges to data management systems. The first challenge is to efficiently generate the updated aggregate values and provide the new results to users after new tuples arrive. We implemented an incremental aggregation mechanism for doing so for arbitrary algebraic aggregate functions including user-defined ones by keeping up-to-date finite data summaries. The second challenge is to construct shared query evaluation plans to support large-scale queries effectively. Since multiple query optimization is NP-complete and the queries generally arrive asynchronously, we apply an incremental sharing approach to obtain the shared plans that perform reasonably well. The system is built as a part of ARGUS, a stream processing system atop of a DBMS. The evaluation study shows that our approaches are effective and efficient on typical collaborative intelligence analysis data and queries.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)
Agarwal, S., et al.: On the computation of multidimensional aggregates. In: VLDB, pp. 506–521 (1996)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)
Blakeley, J.A., Coburn, N., Larson, P.-Å.: Updating derived relations: Detecting irrelevant and autonomously computable updates. ACM Trans. Database Syst. 14(3), 369–400 (1989)
Chandrasekaran, S., et al.: TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In: CIDR (January 2003)
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: Niagaracq: A scalable continuous query system for internet databases. In: SIGMOD Conference, pp. 379–390 (2000)
Chen, Z., Narasayya, V.R.: Efficient computation of multiple group by queries. In: SIGMOD Conference, pp. 263–274 (2005)
Cormode, G., et al.: Holistic aggregates in a networked world: Distributed tracking of approximate quantiles. In: SIGMOD Conference, pp. 25–36 (2005)
DeHaan, D., Larson, P.-Å., Zhou, J.: Stacked indexed views in Microsoft SQL Server. In: SIGMOD Conference, pp. 179–190 (2005)
Gazen, C., Carbonell, J., Hayes, P.: Novelty Detection in Data Streams: A Small Step Towards Anticipating Strategic Surprise. In: NIMD PI Meeting, Washington, DC (2005)
Gray, J., et al.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. J. Data Mining and Knowledge Discovery 1(1), 29–53 (1997)
Gupta, A., Jagadish, H.V., Mumick, I.S.: Data integration using self-maintainable views. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 140–144. Springer, Heidelberg (1996)
Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. In: SIGMOD Conference, pp. 205–216 (1996)
Jin, C., Carbonell, J.: Toward Incremental Sharing On Continuous Queries. Tech. Report available upon request from authors, Carnegie Mellon Univ. (2005)
Jin, C., Carbonell, J., Hayes, P.: ARGUS: Rete + DBMS = Efficient Persistent Profile Matching on Large-Volume Data Streams. In: Hacid, M.-S., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds.) ISMIS 2005. LNCS (LNAI), vol. 3488, pp. 142–151. Springer, Heidelberg (2005)
Levy, A.Y., Mendelzon, A.O., Sagiv, Y., Srivastava, D.: Answering queries using views. In: PODS, pp. 95–104 (1995)
Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: Semantics and evaluation techniques for window aggregates in data streams. In: SIGMOD Conf., pp. 311–322 (2005)
Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: SIGMOD Conference, pp. 563–574 (2003)
Ross, K.A., Srivastava, D.: Fast computation of sparse datacubes. In: VLDB, pp. 116–125 (1997)
Scheufele, W., Moerkotte, G.: On the complexity of generating optimal plans with cross products. In: PODS, pp. 238–248 (1997)
Sellis, T.K., Ghosh, S.: On the multiple-query optimization problem. IEEE Trans. Knowl. Data Eng. 2(2), 262–266 (1990)
Zhang, M., Kao, B., Cheung, D.W.-L., Yip, K.: Mining periodic patterns with gap requirement from sequences. In: SIGMOD Conference (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jin, C., Carbonell, J. (2006). Incremental Aggregation on Multiple Continuous Queries. In: Esposito, F., Raś, Z.W., Malerba, D., Semeraro, G. (eds) Foundations of Intelligent Systems. ISMIS 2006. Lecture Notes in Computer Science(), vol 4203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875604_20
Download citation
DOI: https://doi.org/10.1007/11875604_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45764-0
Online ISBN: 978-3-540-45766-4
eBook Packages: Computer ScienceComputer Science (R0)