Abstract
We address continuously processing an aggregation join query over data streams. Queries of this type involve both join and aggregation operations, with windows specified on join input streams. To our knowledge, the existing researches address join query optimization and aggregation query optimization as separate problems. Our observation, however, is that by putting them within the same scope of query optimization we can generate more efficient query execution plans. This is through more versatile query transformations, the key idea of which is to perform aggregation before join so join execution time may be reduced. This idea itself is not new (already proposed in the database area), but developing the query transformation rules faces a completely new set of challenges. In this paper, we first propose a query processing model of an aggregation join query with two key stream operators: (1) aggregation set update, which produces an aggregation set of tuples (one tuple per group) and updates it incrementally as new tuples arrive, and (2) aggregation set join, i.e., join between a stream and an aggregation set of tuples. Then, we introduce the concrete query transformation rules specialized to work with streams. The rules are far more compact and yet more general than the rules proposed in the database area. Then, we present a query processing algorithm generic to all alternative query execution plans that can be generated through the transformations, and study the performances of alternative query execution plans through extensive experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kang, J., Naughton, J.F., Viglas, S.D.: Evaluating window joins over unbounded streams. In: Proceedings of ICDE, Bangalore, India, pp. 341–352. IEEE Computer Society Press, Los Alamitos (2003)
Golab, L., Ozsu, M.T.: Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of VLDB, pp. 500–511. ACM Press, New York (2003)
Das, A., Gehrke, J., Riedewald, M.: Approximate join processing over data streams. In: Proceedings of ACM SIGMOD, San Diego, California, pp. 40–51. ACM Press, New York (2003)
Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: Semantics and evaluation techniques for window aggregates in data streams. In: Proceedings of SIGMOD, pp. 311–322. ACM Press, New York (2005)
Ayad, A., Naughton, J.F.: Static optimization of conjunctive queries with sliding windows over infinite streams. In: Proceedings of ACM SIGMOD, pp. 419–430. ACM Press, New York (2004)
Arasu, A., Widom, J.: Resource sharing in continuous sliding-window aggregates. In: Proceedings of VLDB, pp. 336–347. Morgan Kaufmann, San Francisco (2004)
Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows. In: Proceedings of PODS, pp. 286–296. ACM Press, New York (2004)
Ding, L., Rundensteiner, E.A.: Evaluating window joins over punctuated streams. In: Proceedings of CIKM, pp. 98–107. ACM Press, New York (2004)
Ghanem, T.M., Hammad, M.A., Mokbel, M.F., Aref, W.G., Elmagarmid, A.K.: Incremental evaluation of sliding-window queries over data streams. IEEE TKDE 19(1), 57–72 (2007)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of ACM SIGMOD, Madison, Wisconsin, pp. 1–16. ACM Press, New York (2002)
Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. SIGMOD Record 30(2), 13–24 (2001)
Babu, S., Arasu, A., Widom, J.: CQL: A language for continuous queries over streams and relations. In: Lausen, G., Suciu, D. (eds.) DBPL 2003. LNCS, vol. 2921, pp. 1–19. Springer, Heidelberg (2004)
Viglas, S., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: Proceedings of VLDB, pp. 285–296 (2003)
Urhan, T., Franklin, M.J.: Xjoin: A reactively-scheduled pipelined join operator. In: IEEE Data Enginerring Bullentin, pp. 27–33. IEEE Computer Society Press, Los Alamitos (2000)
Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data streams. In: Proceedings of ACM SIGMOD, Madison, Wisconsin, pp. 61–72. ACM Press, New York (2002)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In: Proceedings of VLDB, pp. 79–88. Morgan Kaufmann, San Francisco (2001)
Guha, S., Koudas, N.: Approximating a data stream for querying and estimation: Algorithms and performance evaluation. In: Proceedings of ICDE, pp. 567–579 (2002)
Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: Proceedings of ACM SIGMOD, pp. 193–204. ACM Press, New York (1999)
Jiang, Z., Luo, C., Hou, W.-C., Yan, F., Zhu, Q.: Estimating aggregate join queries over data streams using discrete cosine transform. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 182–192. Springer, Heidelberg (2006)
Chaudhuri, S., Shim, K.: Including group-by in query optimization. In: Proceedings of VLDB, pp. 354–366. Morgan Kaufmann, San Francisco (1994)
Yan, W.P., Larson, P.Å.: Eager aggregation and lazy aggregation. In: Proceedings of VLDB, pp. 345–357. Morgan Kaufmann, San Francisco (1995)
Tran, T.M., Lee, B.S.: Transformation of continuous aggregation join queries over data streams. Technical Report CS-07-02, Department of Computer Science, University of Vermont (2007)
Abadi, D.J., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: a new model and architecture for data stream management. The VLDB Journal 12(2), 120–139 (2003)
Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G.S., Olston, C., Rosenstein, J., Varma, R.: Query processing, approximation, and resource management in a data stream management system. In: Proceedings of CIDR, pp. 22–34 (2003)
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S.R., Reiss, F., Shah, M.A.: TelegraphCQ: continuous dataflow processing. In: Proceedings of ACM SIGMOD, San Diego, California, pp. 668–668. ACM Press, New York (2003)
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: a scalable continuous query system for internet databases. In: Proceedings of ACM SIGMOD, Dallas, Texas, United States, pp. 379–390. ACM Press, New York (2000)
Bai, Y., Thakkar, H., Wang, H., Luo, C., Zaniolo, C.: A data stream language and system designed for power and extensibility. In: Proceedings of CIKM, pp. 337–346 (2006)
Hammad, M.A., Mokbel, M.F., Ali, M.H., Aref, W.G., Catlin, A.C., Elmagarmid, A.K., Eltabakh, M., Elfeky, M.G., Ghanem, T.M., Gwadera, R., Ilyas, I.F., Marzouk, M.S., Xiong, X.: Nile: A query processing engine for data streams. In: Proceedings of ICDE, pp. 851–863. IEEE Computer Society Press, Los Alamitos (2004)
Sullivan, M.: Tribeca: A stream database manager for network traffic analysis. In: Proceedings of VLDB, pp. 594–606. Morgan Kaufmann, San Francisco (1996)
Cranor, C., Johnson, T., Spataschek, O., Shkapenyuk, V.: Gigascope: a stream database for network applications. In: Proceedings of ACM SIGMOD, San Diego, California, pp. 647–651. ACM Press, New York (2003)
Srivastava, U., Widom, J.: Memory-limited execution of windowed stream joins. In: Proceedings of VLDB, pp. 324–335. Morgan Kaufmann, San Francisco (2004)
Hammad, M.A., Aref, W.G., Elmagarmid, A.K.: Stream window join: Tracking moving objects in sensor-network databases. In: Proceedings of SSDBM, pp. 75–84 (2003)
Ojewole, A., Zhu, Q., Hou, W.-C.: Window join approximation over data streams with importance semantics. In: Proceedings of CIKM, pp. 112–121 (2006)
Zhang, R., Koudas, N., Ooi, B.C., Srivastava, D.: Multiple aggregations over data streams. In: Proceedings of ACM SIGMOD, pp. 299–310. ACM Press, New York (2005)
Tatbul, N., Zdonik, S.B.: Window-aware load shedding for aggregation queries over data streams. In: Proceedings of VLDB, pp. 799–810 (2006)
Babcock, B., Datar, M., Motwani, R.: Load shedding for aggregation queries over data streams. In: Proceedings of ICDE, p. 350. IEEE Computer Society Press, Los Alamitos (2004)
Considine, J., Li, F., Kollios, G., Byers, J.W.: Approximate aggregation techniques for sensor databases. In: Proceedings of ICDE, pp. 449–460. IEEE Computer Society Press, Los Alamitos (2004)
Yan, W.P., Larson, P.-Å.: Performing group-by before join. In: Proceedings of ICDE, pp. 89–100. IEEE Computer Society Press, Los Alamitos (1994)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tran, T.M., Lee, B.S. (2007). Transformation of Continuous Aggregation Join Queries over Data Streams. In: Papadias, D., Zhang, D., Kollios, G. (eds) Advances in Spatial and Temporal Databases. SSTD 2007. Lecture Notes in Computer Science, vol 4605. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73540-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-73540-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73539-7
Online ISBN: 978-3-540-73540-3
eBook Packages: Computer ScienceComputer Science (R0)