Abstract
As stream data is being more frequently collected and analyzed, stream processing systems are faced with more design challenges. One challenge is to perform continuous window aggregation, which involves intensive computation. When there are a large number of aggregation queries, the system may suffer from scalability problems. The queries are usually similar and only differ in window specifications. In this paper, we propose collaborative aggregation which promotes aggregate sharing among the windows so that repeated aggregate operations can be avoided. Different from the previous approaches in which the aggregate sharing is restricted by the window pace, we generalize the aggregation over multiple values as a series of reductions. Therefore, the results generated by each reduction step can be shared. The sharing process is formalized in the feed semantics and we present the compose-and-declare framework to determine the data sharing logic at a very low cost. Experimental results show that our approach offers an order of magnitude performance improvement to the state-of-the-art results and has a small memory footprint.
Similar content being viewed by others
References
Zhu Y, Shasha D. StatStream: Statistical monitoring of thousands of data streams in real time. In Proc. the 28th VLDB, Aug. 2002, pp.358-369.
Naidu K V M, Rastogi R, Satkin S, Srinivasan A. Memoryconstrained aggregate computation over data streams. In Proc. the 27th IEEE International Conference on Data Engineering (ICDE), Apr. 2011, pp.852-863.
Krishnamurthy S, Franklin M J, Davis J, Farina D, Golovko P, Li A, Thombre N. Continuous analytics over discontinuous streams. In Proc. the 29th ACM SIGMOD International Conference on Management of Data, June 2010, pp.1081-1092.
Arasu A, Babu S, Widom J (2006) The CQL continuous query language: Semantic foundations and query execution. The VLDB Journal 15(2):121–142
Deshpande P, Ramasamy K, Shukla A, Naughton J F. Caching multidimensional queries using chunks. In Proc. the 17th ACM SIGMOD International Conference on Management of Data, June 1998, pp.259-270.
Mistry H, Roy P, Sudarshan S, Ramamritham K (2000) Materialized view selection and maintenance using multi-query optimization. ACM SIGMOD Record 30(2):307–318
Sellis TK (1988) Multiple-query optimization. ACM Transactions on Database Systems 13(1):23–52
Roy P, Seshadri S, Sudarshan S, Bhobe S (2000) Efficient and extensible algorithms for multi query optimization. ACM SIGMOD Record 29(2):249–260
Ghanem T, Hammad M, Mokbel M, Aref W, Elmagarmid A (2007) Incremental evaluation of sliding-window queries over data streams. IEEE Transactions on Knowledge and Data Engineering 19(1):57–72
Li J, Maier D, Tufte K, Papadimos V, Tucker P A. Semantics and evaluation techniques for window aggregates in data streams. In Proc. the 24th ACM SIGMOD International Conference on Management of Data, June 2005, pp.311-322.
Li J, Maier D, Tufte K, Papadimos V, Tucker PA (2005) No pane, no gain: Efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Record 34(1):39–44
Krishnamurthy S, Wu C, Franklin M. On-the-fly sharing for streamed aggregation. In Proc. the 25th ACM SIGMOD International Conference on Management of Data, June 2006, pp.623-634.
Guirguis S, Sharaf M A, Chrysanthis P K, Labrinidis A. Three-level processing of multiple aggregate continuous queries. In Proc. the 28th IEEE International Conference on Data Engineering (ICDE), Apr. 2012, pp.929-940.
Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, Pirahesh H (1997) Data cube: A relational aggregation operator generalizing groupby, cross-tab, and sub-totals. Data Mining and Knowledge Discovery 1(1):29–53
Huebsch R, Garofalakis M, Hellerstein J M, Stoica I. Sharing aggregate computation for distributed queries. In Proc. the 26th ACM SIGMOD International Conference on Management of Data, June 2007, pp.485-496.
Abadi DJ, Carney D (2003) C¸etintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik S. Aurora: A new model and architecture for data stream management. The VLDB Journal 12(2):120–139
Babu S, Widom J (2001) Continuous queries over data streams. ACM SIGMOD Record 30(3):109–120
Bhatotia P, Dischinger M, Rodrigues R, Acar U A. Slider: Incremental sliding-window computations for large-scale data analysis. Technical Report: MPI-SWS-2012-004, Universidade Nova de Lisboa, 2012.
Cormode G, Johnson T, Korn F, Muthukrishnan S, Spatscheck O, Srivastava D. Holistic UDAFS at streaming speeds. In Proc. the 23rd ACM SIGMOD International Conference on Management of Data, June 2004, pp.35-46.
Guirguis S, Sharaf MA, Chrysanthis P K, Labrinidis A. Optimized processing of multiple aggregate continuous queries. In Proc. the 20th ACM International Conference on Information and Knowledge Management, Oct. 2011, pp.1515-1524.
Arasu A, Widom J. Resource sharing in continuous slidingwindow aggregates. In Proc. the 30th International Conference on Very Large Data Bases, Aug.31-Sept.3, 2004, pp.336-347.
Patroumpas K, Sellis T. Multi-granular time-based sliding windows over data streams. In Proc. the 17th International Symposium on Temporal Representation and Reasoning (TIME), Sept. 2010, pp.146-153.
Patroumpas K, Sellis T. Subsuming multiple sliding windows for shared stream computation. In Proc. the 15th International Conference on Advances in Databases and Information Systems, Sept. 2011, pp.56-69.
Patroumpas K, Sellis T (2011) Maintaining consistent results of continuous queries under diverse window speciffications. Information Systems 36(1):42–61
Golab L, Bijay K G, Özsu M T. Multi-query optimization of sliding window aggregates by schedule synchronization. In Proc. the 15th ACM International Conference on Information and Knowledge Management, Nov. 2006, pp.844-845.
Lee R, Xu Z (2009) Exploiting stream request locality to improve query throughput of a data integration system. IEEE Transactions on Computers 58(10):1356–1368
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, W., Shen, YM. & Wang, P. An Efficient Approach of Processing Multiple Continuous Queries. J. Comput. Sci. Technol. 31, 1212–1227 (2016). https://doi.org/10.1007/s11390-016-1693-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-016-1693-8