ABSTRACT
Monitoring aggregates on network traffic streams is a compelling application of data stream management systems. Often, streaming aggregation queries involve joining multiple inputs (e.g., client requests and server responses) using temporal join conditions (e.g., within 5 seconds), followed by computation of aggregates (e.g., COUNT) over temporal windows (e.g., every 5 minutes). These types of queries help identify malfunctioning servers (missing responses), malicious clients (bursts of requests during a denial-of-service attack), or improperly configured protocols (short timeout intervals causing many retransmissions). However, while such query expression is natural, its evaluation over massive data streams is inefficient.
In this paper, we develop rewriting techniques for streaming aggregation queries that join multiple inputs. Our techniques identify conditions under which expensive joins can be optimized away, while providing error bounds for the results of the rewritten queries. The basis of the optimization is a powerful but decidable theory in which constraints over data streams can be formulated. We show the efficiency and accuracy of our solutions via experimental evaluation on real-life IP network data using the Gigascope stream processing engine.
- A. Artale, E. Franconi, and F. Mandreoli. Description Logics for Modelling Dynamic Information. In Logics for Emerging Applications of Databases. Lecture Notes in Computer Science, Springer-Verlag, 2003.Google ScholarDigital Library
- A. Ayad and J. F. Naughton. Static Optimization of Conjunctive Queries with Sliding Windows Over Infinite Streams. In ACM SIGMOD, pages 419--430, 2004. Google ScholarDigital Library
- B. Babcock, S. Babu, M. Datar, and R. Motwani. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems. In ACM SIGMOD, pages 253--264, 2003. Google ScholarDigital Library
- S. Babu, U. Srivastava, and J. Widom. Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. ACM Trans. Database Syst., 29(3): 545--580, 2004. Google ScholarDigital Library
- C. Bettini, S. Jajodia, and X. S. Wang. Time Granularities in Databases, Data Mining, and Temporal Reasoning. Springer, 2000. Google ScholarDigital Library
- J. R. Büchi. On a Decision Method in Restricted Second Order Arithmetic. In International Congress on Logic, Methodology, and Philosophy of Science, pages 1--11, 1962.Google Scholar
- A. K. Chandra, H. R. Lewis, and J. A. Makowsky. Embedded Implicational Dependencies and their Inference Problem. In ACM STOC, pages 342--354, 1981. Google ScholarDigital Library
- J. Chomicki. Efficient Checking of Temporal Integrity Constraints Using Bounded History Encoding. ACM Trans. Database Syst., 20(2): 149--186, 1995. Google ScholarDigital Library
- J. Chomicki and D. Toman. Temporal Databases. In Handbook of Temporal Reasoning in Artificial Intelligence. Elsevier, 2005.Google Scholar
- C. D. Cranor, T. Johnson, O. Spatscheck, and V. Shkapenyuk. Gigascope: A Stream Database for Network Applications. In ACM SIGMOD, pages 647--651, 2003. Google ScholarDigital Library
- D. DeHaan, D. Toman, and G. E. Weddell. Rewriting Aggregate Queries using Description Logics. In Description Logics, pages 103--112. CEUR-WS vol. 81, 2003.Google Scholar
- H. Garcia-Molina, J. Ullman, and J. Widom. Database System Implementation. Prentice Hall, 2000. Google ScholarDigital Library
- L. Golab and M. T. Özsu. Processing sliding window multi-joins in continuous queries over data streams. In VLDB, pages 500--511, 2003. Google ScholarDigital Library
- A. Gupta, V. Harinarayan, and D. Quass. Aggregate-query processing in data warehousing environments. In VLDB, pages 358--369, 1995. Google ScholarDigital Library
- Information Sciences Institute. RFC 793, 1981.Google Scholar
- C. S. Jensen, R. T. Snodgrass, and M. D. Soo. Extending Existing Dependency Theory to Temporal Databases. IEEE TKDE, 8(4), 1996. Google ScholarDigital Library
- T. Johnson, S. Muthukrishnan, O. Spatscheck, and D. Srivastava. Streams, security and scalability. In IFIP Data and Applications Security, LNCS 3654, pages 1--15, 2005. Google ScholarDigital Library
- F. Kabanza, J.-M. Stevenne, and P. Wolper. Handling Infinite Temporal Data. J. Comput. Syst. Sci., 51(1): 3--17, 1995. Google ScholarDigital Library
- J. Kang, J. F. Naughton, and S. Viglas. Evaluating Window Joins over Unbounded Streams. In lCDE, pages 341--352, 2003.Google Scholar
- R. Kompella, S. Singh, and G. Varghese. On scalable attack detection in the network. IEEE/ACM Transactions on Networking, 15(1): 14--25, 2007. Google ScholarDigital Library
- F. Korn, S. Muthukrishnan, and Y. Zhu. Checks and Balances: Monitoring Data Quality Problems in Network Traffic Databases. In VLDB, pages 536--547, 2003. Google ScholarDigital Library
- M. Mellia, I. Stoica, and H. Zhang. TCP Model for Short Lived Flows. IEEE Communcations Letters, 6(2): 85--87, 2002.Google ScholarCross Ref
- G. N. Paulley and P.-Å. Larson. Exploiting Uniqueness in Query Optimization. In ICDE, pages 68--79, 1994. Google ScholarDigital Library
- V. Shkapenyuk, T. Johnson, S. Muthukrishnan, and O. Spatscheck. Query-aware sampling for data streams. In Int. Workshop on Scalable Stream Processing Systems (SSPS), 2007. Google ScholarDigital Library
- D. Srivastava, S. Dar, H. V. Jagadish, and A. Y. Levy. Answering queries with aggregation using views. In VLDB, pages 318--329, 1996. Google ScholarDigital Library
- U. Srivastava and J. Widom. Memory-Limited Execution of Windowed Stream Joins. In VLDB, pages 324--335, 2004. Google ScholarDigital Library
- N. Tatbul, U. Cetintemel, S. B. Zdonik, M. Cherniack, and M. Stonebraker. Load Shedding in a Data Stream Manager. In VLDB, pages 309--320, 2003. Google ScholarDigital Library
- S. Viglas, J. F. Naughton, and J. Burger. Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources. In VLDB, pages 285--296, 2003. Google ScholarDigital Library
- H. Wang, D. Zhang, and K. Shin. Change-point monitoring for detection of DoS attacks. IEEE Trans, on Dependable and Secure Comp., 1(4): 193--208, 2004. Google ScholarDigital Library
- X. S. Wang, C. Bettini, A. Brodsky, and S. Jajodia. Logical Design for Temporal Databases with Multiple Granularities. ACM Trans. Database Syst., 22(2): 115--170, 1997. Google ScholarDigital Library
- J. Wijsen. Temporal FDs on Complex Objects. ACM Trans. Database Syst., 24(1): 127--176, 1999. Google ScholarDigital Library
Index Terms
- Optimizing away joins on data streams
Recommendations
Load Shedding for Window Joins on Multiple Data Streams
ICDEW '07: Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering WorkshopWe consider the problem of semantic load shedding for continuous queries containing window joins on multiple data streams and propose a robust approach that is effective with the different semantic accuracy criteria that are required in different ...
Processing sliding window multi-joins in continuous queries over data streams
VLDB '03: Proceedings of the 29th international conference on Very large data bases - Volume 29We study sliding window multi-join processing in continuous queries over data streams. Several algorithms are reported for performing continuous, incremental joins, under the assumption that all the sliding windows fit in main memory. The algorithms ...
Comments