Abstract
Data streams characterize the high speed and large volume input of a new class of applications such as network monitoring, web content analysis and sensor networks. Among these applications, network monitoring may be the most compelling one—the backbone of a large internet service provider can generate 1 petabyte of data per day. For many network monitoring tasks such as traffic analysis and statistics collection, aggregation is a primitive operation. Various analytical and statistical needs naturally lead to related aggregate queries. In this article, we address the problem of efficiently computing multiple aggregations over high-speed data streams based on the two-level query processing architecture of GS, a real data stream management system deployed in AT & T. We discern that additionally computing and maintaining fine-granularity aggregations (called phantoms) has the benefit of supporting shared computation. Based on a thorough analysis, we propose algorithms to identify the best set of phantoms to maintain and determine allocation of resources (particularly, space) to compute the aggregations. Experiments show that our algorithm achieves near-optimal computation costs, which outperforms the best adapted algorithm by more than an order of magnitude.
Similar content being viewed by others
References
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: ACM Symposium on Theory of Computing (STOC), pp. 20–29. Philadephia, USA (1996)
Arasu A., Babcock B., Babu S., Datar M., Ito K., Motwani R., Nishizawa I., Srivastava U., Thomas D., Varma R., Widom J.: STREAM: the stanford stream data manager. IEEE Data Eng. Bull. 26(1), 19–26 (2003)
Arasu, A., Widom, J.: Resource sharing in continuous sliding-window aggregates. In: International Conference on very large data bases (VLDB), pp. 336–347. Toronto, Canada (2004)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: ACM SIGACT-SIGMOD-SIGART Symposium on principles of database systems (PODS), pp. 1–16. Madison, USA (2002)
Barbour, A.D., Holst, L., Janson, S.: Poisson approximation. Oxford Science Publications (1992)
Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams—a new class of data management applications. In: International Conference on very large data bases (VLDB), pp. 215–226. Hong Kong, China (2002)
Chakravarthy U., Minker J.: Processing multiple queries in database systems. IEEE Database Eng. Bull. 5(3), 38–44 (1982)
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, S.K.W., Madden, S., Raman, V., Reiss, F., Shah, M.: TelegraphCQ: continuous dataflow processing for an uncertain world. In: Conference on innovative data systems research (CIDR), Asilomar, USA (2003)
Chandrasekaran, S., Franklin, M.J.: Streaming queries over streaming data. In: International Conference on very large data bases (VLDB), pp. 203–214. Hong Kong, China (2002)
Chaudhuri, S., Das, G., Narasayya, V.: A robust, optimization-based approach for approximate answering of aggregate queries. In: ACM International Conference on management of data (SIGMOD), pp. 295–306. Santa Barbara, USA (2001)
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: A scalable continuous query system for internet databases. In: ACM International Conference on management of data (SIGMOD), pp. 379–390. Dallas, USA (2000)
Cranor, C., Johnson, T., Spatscheck, O., Shkapenyuk, V.: Gigascope: a stream database for network applications. In: ACM International Conference on management of data (SIGMOD), pp. 647–651. San Diego, USA (2003)
Demers, A.J., Gehrke, J., Hong, M., Riedewald, M., White, W.M.: Towards expressive publish/subscribe systems. In: EDBT, pp. 627–644 (2006)
Diao Y., Altinel M., Franklin M.J., Zhang H., Fischer P.M.: Path sharing and predicate evaluation for high-performance xml filtering. ACM Trans. Database Syst. 28(4), 467–516 (2003)
Dobra, A., Garofalakis, M.N., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data streams. In: ACM International Conference on management of data (SIGMOD), pp. 61–72. Madison, USA (2002)
Dobra, A., Garofalakis, M.N., Gehrke, J., Rastogi, R.: Sketch-based multi-query processing over data streams. In: International Conference on extending database technology (EDBT), pp. 551–568. Heraklion, Greece (2004)
Finkelstein, S.: Common expression analysis in database applications. In: ACM International Conference on management of data (SIGMOD), pp. 235–245. Orlando, USA (1982)
Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: ACM International Conference on management of data (SIGMOD), pp. 13–24 (2001)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In: International Conference on very large data bases (VLDB), pp. 79–88. Roma, Italy (2001)
Gupta A., Mumick I.S.: Maintenance of materialized views: problems, techniques and applications. IEEE Data Eng. Bull., Special Issue on Materialized Views and Data Warehousing 18(2), 3–18 (1995)
Hall P.A.V.: Optimization of single expressions in a relational data base system. IBM J. Res. Dev. 20(3), 244–257 (1976)
Hammad, M.A., Mokbel, M.F., Ali, M.H., Aref, W.G., Catlin, A.C., Elmagarmid, A.K., Eltabakh, M.Y., Elfeky, M.G., Ghanem, T.M., Gwadera, R., Ilyas, I.F., Marzouk, M.S., Xiong, X.: Nile: A query processing engine for data streams. In: ICDE, p. 851 (2004)
Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. In: ACM International Conference on management of data (SIGMOD), pp. 205–216. Montreal, Canada (1996)
Hong, M., Riedewald, M., Koch, C., Gehrke, J., Demers, A.J.: Rule-based multi-query optimization. In: EDBT, (2009)
Koudas, N., Srivastava, D.: Data stream query processing: a tutorial. In: International Conference on Very Large Data Bases (VLDB), p. 1149 (2003)
Krishnamurthy, S., Wu, C., Franklin, M.J.: On-the-fly sharing for streamed aggregation. In: SIGMOD Conference (2006)
Larson, P.-Å.: Data reduction by partial preaggregation. In: ICDE (2002)
Madden, S., Shah, M., Hellerstein, J., Raman, V.: Continuously adaptive continuous queries over streams. In: ACM International Conference on management of data (SIGMOD), pp. 49–60. Madison, USA (2002)
Ross, K.A., Srivastava, D., Sudarshan, S.: Materialized view maintenance and integrity constraint checking: trading space for time. In: ACM International Conference on management of data (SIGMOD), pp. 447–458. Montreal, Canada (1996)
Roussopoulos N.: View indexing in relational databases. ACM Trans. Database Syst. 7(2), 256–290 (1982)
Sullivan, M., Heybey, A.: Tribeca: A system for managing large databases of network traffic. In: USENIX Technical Conference. New Orleans, USA (1998)
Wong E., Youssefi K.: Decomposition - a strategy for query processing. ACM Trans. Database Syst. 1(3), 223–241 (1976)
Zhang, R., Koudas, N., Ooi, B.C., Srivastava, D.: Multiple aggregations over data streams. In: ACM International Conference on management of data (SIGMOD), pp. 299–310. Baltimore, USA (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, R., Koudas, N., Ooi, B.C. et al. Streaming multiple aggregations using phantoms. The VLDB Journal 19, 557–583 (2010). https://doi.org/10.1007/s00778-010-0180-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-010-0180-z