Abstract
Data stream management systems are a natural choice to efficiently process continuous queries over high volume data streams, e.g., to monitor sensor data or transaction streams. An immediate reaction on detected critical or security relevant situations is essential for a secure and economic operation, as in our scenario of monitoring decentralized energy systems, which realize geographically distributed energy generation processes. Without further provisions existing processing approaches may lead to a delay of critical or security relevant messages in high load situations, e.g., caused by bursts.
One way to allow an adequate processing in such situations is to prioritize queries that handle critical situations. Unfortunately, problems are not always solely identifiable by a query. Sometimes certain – e.g., out of range – data values or error messages indicate situations, which urge a faster processing of all queries processing these data. Traditional approaches on continuous query execution assume a stream order, typically based on timestamps, and a processing following this order. In this article we consider the prioritization of those elements and propose an out-of-order execution in the data stream.
We provide a comprehensive and formally founded approach for prioritizing data stream elements. Prioritized elements benefit twice from our approach. On the one hand, they are able to “overtake” lower prioritized elements, e.g., in queues. On the other hand, prioritized results can be produced earlier in stateful operators than this would be possible in other approaches. Still, the semantics of the queries remains unchanged. We implemented our approach and show with measurements that a very low latency of prioritized elements can be achieved – even under high load. As a result, all queries that process prioritized elements can benefit from our approach.
Similar content being viewed by others
References
Abadi DJ, Carney D, Çetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik S (2003) Aurora: a new model and architecture for data stream management. VLDB J 12(2):120–139, doi: http://dx.doi.org/10.1007/s00778-003-0095-z
Arasu A, Babu S, Widom J (2006) The cql continuous query language: semantic foundations and query execution. VLDB J 15(2):121–142
Babcock B, Babu S, Datar M, Motwani R (2003) Chain: Operator scheduling for memory minimization in data stream systems. In: Halevy AY, Ives ZG, Doan A (eds) SIGMOD Conference, ACM, pp 253–264
Bolles A, Grawunder M, Jacobi J (2008) Streaming sparql – extending sparql to process data streams. In: Bechhofer S, Hauswirth M, Hoffmann J, Koubarakis M (eds) ESWC, Lecture Notes in Computer Science 5021:448–462, Springer
Cammert M, Heinz C, Krämer J, Schneider M, Seeger B (2003) A status report on xxl – a software infrastructure for efficient query processing. IEEE Data Eng Bull 26(2):12–18
Cammert M, Heinz C, Krämer J, Seeger B, Vaupel S, Wolske U (2007) Flexible multi-threaded scheduling for continuous queries over data streams. In: First International Workshop on Scalable Stream Processing Systems
Carney D, Çetintemel U, Rasin A, Zdonik SB, Cherniack M, Stonebraker M (2003) Operator scheduling in a data stream manager. In: VLDB, pp 838–849
Ding L, Rundensteiner EA (2004) Evaluating window joins over punctuated streams. In: CIKM ’04: Proceedings of the thirteenth ACM international conference on Information and knowledge management. ACM, New York, NY, doi: http://doi.acm.org/10.1145/1031171.1031189, pp 98–107
Haas PJ, Hellerstein JM (1999) Ripple joins for online aggregation. In: Delis A, Faloutsos C, Ghandeharizadeh S (eds) SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1–3, 1999, Philadelphia, Pennsylvania, USA, ACM Press, pp 287–298
Hammad MA, Franklin MJ, Aref WG, Elmagarmid AK (2003) Scheduling for shared window joins over data streams. In: VLDB ’2003: Proceedings of the 29th international conference on Very large data bases, VLDB Endowment, pp 297–308
Krämer J (2007) Continuous queries over data streams – semantics and implementation. Ph.D. thesis, Philipps-Universität Marburg, Marburg an der Lahn
Krämer J, Seeger B (2005) A temporal foundation for continuous queries over data streams. In: Haritsa JR, Vijayaraman TM (eds) COMAD. Computer Society of India, pp 70–82
Li J, Tufte K, Shkapenyuk V, Papadimos V, Johnson T, Maier D (2008) Out-of-order processing: A new architecture for high-performance stream systems. In: VLDB, pp 274–288
Li M, Liu M, Ding L, Rundensteiner EA, Mani M (2007) Event stream processing with out-of-order data arrival. In: ICDCSW ’07: Proceedings of the 27th International Conference on Distributed Computing Systems Workshops. IEEE Computer Society, Washington, DC, USA, p 67, doi: http://dx.doi.org/10.1109/ICDCSW.2007.35
Sharaf MA, Chrysanthis PK, Labrinidis A, Pruhs K (2008) Algorithms and metrics for processing multiple heterogeneous continuous queries. ACM Trans Database Syst 33(1):
Srivastava U, Widom J (2004) Flexible time management in data stream systems. In: PODS ’04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposiumon Principles of database systems. ACM Press, New York, NY, pp 263–274, doi: http://doi.acm.org/10.1145/1055558.1055596
Tatbul N, Çetintemel U, Zdonik SB, Cherniack M, Stonebraker M (2003) Load shedding in a data stream manager. In: VLDB, pp 309–320
Urhan T, Franklin MJ (2001) Dynamic pipeline scheduling for improving interactive query performance. In: Apers PMG, Atzeni P, Ceri S, Paraboschi S, Ramamohanarao K, Snodgrass RT (eds) VLDB. Morgan Kaufmann, pp 501–510
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jacobi, J., Bolles, A., Grawunder, M. et al. A physical operator algebra for prioritized elements in data streams . Comput Sci Res Dev 25, 235–246 (2010). https://doi.org/10.1007/s00450-009-0102-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-009-0102-8