Abstract
Data stream management systems (DSMSs) offer the most effective solution for processing data streams by efficiently executing continuous queries (CQs) over the incoming data. CQs inherently have different levels of criticality and hence different levels of expected quality of service (QoS) and quality of data (QoD). Adhering to such expected QoS/QoD metrics is even more important in cases of multi-tenant data stream management services. In this work, we propose DILoS, a framework that, through priority-based scheduling and load shedding, supports differentiated QoS and QoD for multiple classes of CQs. Unlike existing works that consider scheduling and load shedding separately, DILoS is a novel unified framework that exploits the synergy between scheduling and load shedding. We also propose ALoMa, a general, adaptive load manager that DILoS is built upon. By its design, ALoMa performs better than the state-of-the-art alternatives in three dimensions: (1) it automatically tunes the headroom factor, (2) it honors the delay target, (3) it is applicable to complex query networks with shared operators. We implemented DILoS and ALoMa in our real DSMS prototype system (AQSIOS) and evaluate their performance for a variety of real and synthetic workloads. Our experimental evaluation of ALoMa verified its clear superiority over the state-of-the-art approaches. Our experimental evaluation of the DILoS framework showed that it (a) allows the scheduler and load shedder to consistently honor CQs’ priorities, (b) significantly increases system capacity utilization by exploiting batch processing, and (c) enables operator sharing among query classes of different priorities while avoiding priority inversion, i.e., a lower-priority class never blocks a higher-priority one.
Similar content being viewed by others
Notes
In fact, the CTRL paper does not even use real operators: It used only delay operators to simulate an operator with a certain processing cost and selectivity. The Aurora paper uses only a simulation for its experiment, not a real DSMS.
Note that because STREAM (inherited by AQSIOS) does not support everything in the CQL syntax, we had to split the query into several virtual queries in the actual script.
Dataset LBL-PKT-4/lbl-pkt-n.tcp is publicly available at the following URL: http://ita.ee.lbl.gov/html/contrib/LBL-PKT.html.
We have observed in some experiments (not shown in this paper), that the reduction in data loss under DILoS can reach up to 100 %, i.e., completely eliminating the need for shedding.
Since the three classes have the same amount of data, total data loss of the three classes is calculated by \(\frac{\sum _{1\le i \le 3}[\mathrm{dataloss}_i])}{3}\)
Note that in this case, the estimated headroom factor of class 1 is not adjusted and still remains at the initial value because the load manager does not have the necessary signals to decrease it.
References
Esper. http://esper.codehaus.org
HP Vertica Best Practices: Resource Management. http://www.vertica.com/2015/02/19/hp-vertica-best-practices-resource-management
Microsoft StreamInSight. https://msdn.microsoft.com/en-us/sqlserver/ee476990.aspx
Pacific tsunami warning center. http://ptwc.weather.gov/
System S - Stream Computing at IBM Research. http://researcher.watson.ibm.com/researcher/view_group_subpage.php?id=2534
Tropical Atmosphere Ocean Project. http://www.pmel.noaa.gov/tao/
Abadi, D.J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: a new model and architecture for data stream management. In: VLDBJ ’03
Al Moakar, L., Chrysanthis, P. K., Chung, C., Guirguis, S., Labrinidis, A., Neophytou, P., Pruhs, K.: Admission control mechanisms for continuous queries in the cloud. In: ICDE’10
Al Moakar, L., Labrinidis, A., Chrysanthis, P. K.: Adaptive class-based scheduling of continuous queries. In: SMDB ’12
Al Moakar, L., Pham, T. N., Neophytou, P., Chrysanthis, P. K., Labrinidis, A., Sharaf, M.: Class-based continuous query scheduling for data streams. In: DMSN ’09
Arasu, A., Cherniack, M., Galvez, E., Maier, D., Maskey, A. S., Ryvkina, E., Stonebraker, M., Tibbetts, R.: Linear road: a stream data management benchmark. In: VLDB’ 04
Babcock, B., Babu, S., Datar, M., Motwani, R., Thomas, D.: Operator scheduling in data stream systems. In: VLDBJ ’04
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS ’02
Babcock, B., Datar, M., Motwani, R.: Load shedding for aggregation queries over data streams. In: ICDE ’04
Carney, D., Çetintemel, U., Rasin, A., Zdonik, S., Cherniack, M., Stonebraker, M.: Operator scheduling in a data stream manager. In: VLDB’ 03
Castro Fernandez, R., Migliavacca, M., Kalyvianaki, E., Pietzuch, P.: Integrating scale out and fault tolerance in stream processing using operator state management. In: SIGMOD’13
Chakravarthy, S., Jiang, Q.: Stream Data Processing: A Quality of Service Perspective Modeling. Load Shedding, and Complex Event Processing. Springer, Scheduling (2009)
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M. J., Hellerstein, J. M., Hong, W., Krishnamurthy, S., Madden, S. R., Reiss, F., Shah, M. A.: TelegraphCQ: continuous dataflow processing. In: SIGMOD ’03
Chang, J.H., Kum, H.-C.M.: Frequency-based load shedding over a data stream of tuples. Inf. Sci. 179(21), 3733–3744 (2009)
Chi, Y., Wang, H., Yu, P. S.: Loadstar: load shedding in data stream mining. In: VLDB ’05
Chrysanthis, P. K.: AQSIOS—Next Generation Data Stream Management System. CONET Newsletter, June 2010
Dash, R., Fegaras, L.: Synopsis based load shedding in XML streams. In: EDBT/ICDT ’09 Workshops
Feng, H., Liu, Z., Xia, C. H., Zhang, L.: Load shedding and distributed resource control of stream processing networks. In: Performance Evaluation (2007)
Gedik, B., Wu, K.-L., Yu, P., Liu, L.: GrubJoin: An Adaptive, Multi-Way. Windowed Stream Join with Time Correlation-Aware CPU Load Shedding, TKDE (2007)
Gedik, B., Wu, K.-L., Yu, P. S.: Efficient construction of compact shedding filters for data stream processing. In: ICDE ’08
Gedik, B., Wu, K.-L., Yu, P. S., Liu, L.: Mobiqual: Qos-aware load shedding in mobile CQ systems. In: ICDE ’08
Gedik, B., Wu, K.-L., Yu, P.S., Liu, L.: CPU load shedding for binary stream joins. Knowl. Inf. Syst. 13(3), 271–303 (2007)
Gulisano, V., Jimenez-Peris, R., Patino-Martinez, M., Soriente, C., Valduriez, P.: Streamcloud: an elastic and scalable data streaming system. In: IEEE TPDS, 2012
Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J.M., Ramakrishnan, R., Shahabi, C.: Big data and its technical challenges. CACM 57(7), 86–94 (2014)
Kendai, B., Chakravarthy, S.: Load shedding in MavStream: analysis, implementation, and evaluation. In: BNCOD ’08
Kleiminger, W., Kalyvianaki, E., Pietzuch, P.: Balancing load in stream processing with the cloud. In: ICDEW’ 11
Kulkarni, D., Ravishankar, C. V., Cherniack, M.: Real-time load-adaptive processing of continuous queries over data streams. In: DEBS’ 08
Lei, C., Rundensteiner, E. A.: Robust distributed query processing for streaming data. In: ACM TODS, 2014
Mozafari, B., Zaniolo, C.: Optimal load shedding with aggregates and mining queries. In: ICDE ’10
Narayanan, S., Waas, F.: Dynamic prioritization of database queries. In: ICDE ’11
Nehme, R. V., Rundensteiner, E. A.: Clustersheddy: load shedding using moving clusters over spatio-temporal data streams. In: DASFAA’07
Pham, T. N., Al Moakar, L., Chrysanthis, P. K., Labrinidis, A.: DILoS: a dynamic integrated load manager and scheduler for continuous queries. In: SMDB ’11
Pham, T. N., Chrysanthis, P. K., Labrinidis, A.: Self-managing load shedding for data stream management systems. In: SMDB ’13
Reiss, F., Hellerstein, J. M.: Data triage: an adaptive architecture for load shedding in telegraphCQ. In: ICDE ’05
Sharaf, M. A., Chrysanthis, P. K., Labrinidis, A., Pruhs, K.: Algorithms and metrics for processing multiple heterogeneous continuous queries. In: ACM TODS, 2008
Tatbul, N., Çetintemel, U., Zdonik, S.: Staying FIT: efficient load shedding techniques for distributed stream processing. In: VLDB ’07
Tatbul, N., Çetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: VLDB ’03
Tatbul, N., Zdonik, S.: Window-aware load shedding for aggregation queries over data streams. In: VLDB ’06
Tu, Y.-C., Liu, S., Prabhakar, S., Yao, B.: Load shedding in stream databases: a control-based approach. In: VLDB ’06
Wei, Y., Son, S. H., Stankovic, J. A.: RTSTREAM: real-time query processing for data streams. In: ISORC’ 06
Wolf, J., Bansal, N., Hildrum, K., Parekh, S., Rajan, D., Wagle, R., Wu, K.-L., Fleischer, L.: SODA: An optimizing scheduler for large-scale stream-based distributed computer systems. In: Middleware’ 08
Wu, S., Lv, Y., Yu, G., Gu, Y., Li, X.: A QoS-guaranteeing scheduling algorithm for continuous queries over streams. In: APWeb/WAIM’ 07
Acknowledgments
Our thanks to the anonymous reviewers for their insightful comments and Mark Silvis and Eric Gratta for their help with copyediting. This work was supported in part by NSF awards IIS-0534531, IIS-0746696, OIA-1028162, an Andrew Mellon Predoctoral Fellowship and EMC/ Greenplum.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article enhances and extends preliminary work [37] that was presented in the SMDB’11 Workshop.
Rights and permissions
About this article
Cite this article
Pham, T.N., Chrysanthis, P.K. & Labrinidis, A. Avoiding class warfare: managing continuous queries with differentiated classes of service. The VLDB Journal 25, 197–221 (2016). https://doi.org/10.1007/s00778-015-0411-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-015-0411-4