skip to main content
10.1145/1379272.1379282acmotherconferencesArticle/Chapter ViewAbstractPublication PagessspsConference Proceedingsconference-collections
research-article

Optimizing away joins on data streams

Published:29 March 2008Publication History

ABSTRACT

Monitoring aggregates on network traffic streams is a compelling application of data stream management systems. Often, streaming aggregation queries involve joining multiple inputs (e.g., client requests and server responses) using temporal join conditions (e.g., within 5 seconds), followed by computation of aggregates (e.g., COUNT) over temporal windows (e.g., every 5 minutes). These types of queries help identify malfunctioning servers (missing responses), malicious clients (bursts of requests during a denial-of-service attack), or improperly configured protocols (short timeout intervals causing many retransmissions). However, while such query expression is natural, its evaluation over massive data streams is inefficient.

In this paper, we develop rewriting techniques for streaming aggregation queries that join multiple inputs. Our techniques identify conditions under which expensive joins can be optimized away, while providing error bounds for the results of the rewritten queries. The basis of the optimization is a powerful but decidable theory in which constraints over data streams can be formulated. We show the efficiency and accuracy of our solutions via experimental evaluation on real-life IP network data using the Gigascope stream processing engine.

References

  1. A. Artale, E. Franconi, and F. Mandreoli. Description Logics for Modelling Dynamic Information. In Logics for Emerging Applications of Databases. Lecture Notes in Computer Science, Springer-Verlag, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Ayad and J. F. Naughton. Static Optimization of Conjunctive Queries with Sliding Windows Over Infinite Streams. In ACM SIGMOD, pages 419--430, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Babcock, S. Babu, M. Datar, and R. Motwani. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems. In ACM SIGMOD, pages 253--264, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Babu, U. Srivastava, and J. Widom. Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. ACM Trans. Database Syst., 29(3): 545--580, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bettini, S. Jajodia, and X. S. Wang. Time Granularities in Databases, Data Mining, and Temporal Reasoning. Springer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. R. Büchi. On a Decision Method in Restricted Second Order Arithmetic. In International Congress on Logic, Methodology, and Philosophy of Science, pages 1--11, 1962.Google ScholarGoogle Scholar
  7. A. K. Chandra, H. R. Lewis, and J. A. Makowsky. Embedded Implicational Dependencies and their Inference Problem. In ACM STOC, pages 342--354, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Chomicki. Efficient Checking of Temporal Integrity Constraints Using Bounded History Encoding. ACM Trans. Database Syst., 20(2): 149--186, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Chomicki and D. Toman. Temporal Databases. In Handbook of Temporal Reasoning in Artificial Intelligence. Elsevier, 2005.Google ScholarGoogle Scholar
  10. C. D. Cranor, T. Johnson, O. Spatscheck, and V. Shkapenyuk. Gigascope: A Stream Database for Network Applications. In ACM SIGMOD, pages 647--651, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. DeHaan, D. Toman, and G. E. Weddell. Rewriting Aggregate Queries using Description Logics. In Description Logics, pages 103--112. CEUR-WS vol. 81, 2003.Google ScholarGoogle Scholar
  12. H. Garcia-Molina, J. Ullman, and J. Widom. Database System Implementation. Prentice Hall, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Golab and M. T. Özsu. Processing sliding window multi-joins in continuous queries over data streams. In VLDB, pages 500--511, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Gupta, V. Harinarayan, and D. Quass. Aggregate-query processing in data warehousing environments. In VLDB, pages 358--369, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Information Sciences Institute. RFC 793, 1981.Google ScholarGoogle Scholar
  16. C. S. Jensen, R. T. Snodgrass, and M. D. Soo. Extending Existing Dependency Theory to Temporal Databases. IEEE TKDE, 8(4), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Johnson, S. Muthukrishnan, O. Spatscheck, and D. Srivastava. Streams, security and scalability. In IFIP Data and Applications Security, LNCS 3654, pages 1--15, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Kabanza, J.-M. Stevenne, and P. Wolper. Handling Infinite Temporal Data. J. Comput. Syst. Sci., 51(1): 3--17, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Kang, J. F. Naughton, and S. Viglas. Evaluating Window Joins over Unbounded Streams. In lCDE, pages 341--352, 2003.Google ScholarGoogle Scholar
  20. R. Kompella, S. Singh, and G. Varghese. On scalable attack detection in the network. IEEE/ACM Transactions on Networking, 15(1): 14--25, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. F. Korn, S. Muthukrishnan, and Y. Zhu. Checks and Balances: Monitoring Data Quality Problems in Network Traffic Databases. In VLDB, pages 536--547, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Mellia, I. Stoica, and H. Zhang. TCP Model for Short Lived Flows. IEEE Communcations Letters, 6(2): 85--87, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  23. G. N. Paulley and P.-Å. Larson. Exploiting Uniqueness in Query Optimization. In ICDE, pages 68--79, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. V. Shkapenyuk, T. Johnson, S. Muthukrishnan, and O. Spatscheck. Query-aware sampling for data streams. In Int. Workshop on Scalable Stream Processing Systems (SSPS), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Srivastava, S. Dar, H. V. Jagadish, and A. Y. Levy. Answering queries with aggregation using views. In VLDB, pages 318--329, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. U. Srivastava and J. Widom. Memory-Limited Execution of Windowed Stream Joins. In VLDB, pages 324--335, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Tatbul, U. Cetintemel, S. B. Zdonik, M. Cherniack, and M. Stonebraker. Load Shedding in a Data Stream Manager. In VLDB, pages 309--320, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Viglas, J. F. Naughton, and J. Burger. Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources. In VLDB, pages 285--296, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Wang, D. Zhang, and K. Shin. Change-point monitoring for detection of DoS attacks. IEEE Trans, on Dependable and Secure Comp., 1(4): 193--208, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. X. S. Wang, C. Bettini, A. Brodsky, and S. Jajodia. Logical Design for Temporal Databases with Multiple Granularities. ACM Trans. Database Syst., 22(2): 115--170, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Wijsen. Temporal FDs on Complex Objects. ACM Trans. Database Syst., 24(1): 127--176, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing away joins on data streams

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          SSPS '08: Proceedings of the 2nd international workshop on Scalable stream processing system
          March 2008
          99 pages
          ISBN:9781595939630
          DOI:10.1145/1379272

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 March 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader