skip to main content
research-article

Input-sensitive scalable continuous join query processing

Published: 03 September 2009 Publication History

Abstract

This article considers the problem of scalably processing a large number of continuous queries. Our approach, consisting of novel data structures and algorithms and a flexible processing framework, advances the state-of-the-art in several ways. First, our approach is query sensitive in the sense that it exploits potential overlaps in query predicates for efficient group processing. We partition the collection of continuous queries into groups based on the clustering patterns of the query predicates, and apply specialized processing strategies to heavily clustered groups (or hotspots). We show how to maintain the hotspots efficiently, and use them to scalably process continuous select-join, band-join, and window-join queries. Second, our approach is also data sensitive, in the sense that it makes cost-based decisions on how to process each incoming tuple based on its characteristics. Experiments demonstrate that our approach can improve the processing throughput by orders of magnitude.

Supplementary Material

Agarwal Appendix (a13-agarwal-apndx.pdf)
Online appendix to input-sensitive scalable continuous join query processing. The appendix supports the information on article 13.

References

[1]
Agarwal, P. K., Arge, L., Brodal, G. S., and Vitter, J. S. 1999. I/O-Efficient dynamic point location in monotone planar subdivisions. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 11--20.
[2]
Agarwal, P. K., Xie, J., Yang, J., and Yu, H. 2005. Monitoring continuous band-join queries over dynamic data. In Proceedings of the 16th International Symposium Algorithms and Computation. Spring-Verlag, 349--359.
[3]
Agarwal, P. K., Xie, J., Yang, J., and Yu, H. 2006. Scalable continuous query processing by tracking hotspots. In Proceedings of the International Conference on Very Large Data Bases, 31--42.
[4]
Arge, L. and Vitter, J. 2003. Optimal external memory interval management. SIAM J. Comput. 32, 6, 1488--1508.
[5]
Avnur, R. and Hellerstein, J. M. 2000. Eddies: Continuously adaptive query processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 261--272.
[6]
Babu, S., Bizarro, P., and DeWitt, D. 2005. Proactive re-optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 107--118.
[7]
Bizarro, P., Babu, S., DeWitt, D., and Widom, J. 2005. Content-based routing: Different plans for different data. In Proceedings of the International Conference on Very Large Data Bases, 757--768.
[8]
Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., and Zdonik, S. B. 2002. Monitoring streams--A new class of data management applications. In Proceedings of the International Conference on Very Large Data Bases, 215--226.
[9]
Chandrasekaran, S. and Franklin, M. J. 2003. Psoup: A system for streaming queries over streaming data. VLDB J. 12, 2, 140--156.
[10]
Chen, J., DeWitt, D. J., Tian, F., and Wang, Y. 2000. NiagraCQ: A scalable continuous query system for internet databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 379--390.
[11]
de Berg, M., van Kreveld, M., Overmars, M., and Schwarzkopf, O. 2000. Computational Geometry: Algorithms and Applications, 2nd Ed. Springer, Berlin.
[12]
Demers, A. J., Gehrke, J., Hong, M., Riedewald, M., and White, W. M. 2006. Towards expressive publish/subscribe systems. In Proceedings of the International Conference on Extending Database Technology, 627--644.
[13]
DeWitt, D. J., Naughton, J. F., and Schneider, D. A. 1991. An evaluation of non-equijoin algorithms. In Proceedings of the International Conference on Very Large Data Bases, 443--452.
[14]
Diao, Y., Altinel, M., Franklin, M. J., Zhang, H., and Fischer, P. 2003. Path sharing and predicate evaluation for high-performance XML filtering. ACM Trans. Datab. Syst. 28, 4, 467--516.
[15]
Dittrich, J.-P., Fischer, P. M., and Kossmann, D. 2005. Agile: Adaptive indexing for context-aware information filters. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 215--226.
[16]
Du, Q., Faber, V., and Gunzburger, M. 1999. Centroidal Voronoi tessellations: Applications and algorithms. SIAM Rev. 41, 637--676.
[17]
Fischer, P. M. and Kossmann, D. 2005. Batched processing for information filters. In Proceedings of the International Conference on Data Engineering, 902--913.
[18]
Gehrke, J. and Hellerstein, J. M. 2004. Guest editorial to the special issue on data stream processing. VLDB J. 13, 4, 317.
[19]
Gehrke, J. E. 2003. Special issue on data stream processing. IEEE Data Engin. Bull. 26, 1.
[20]
Green, T. J., Gupta, A., Miklau, G., Onizuka, M., and Suciu, D. 2004. Processing XML streams with deterministic automata and stream indexes. ACM Trans. Datab. Syst. 29, 4, 752--788.
[21]
Hanson, E. and Johnson, T. 1991. The interval skip list: A data structure for finding all intervals that overlap a point. In Proceedings of the Workshop on Algorithms and Data Structures. Springer, 153--164.
[22]
Hanson, E. N., Carnes, C., Huang, L., Konyala, M., Noronha, L., Parthasarathy, S., Park, J. B., and Vernon, A. 1999. Scalable trigger processing. In Proceedings of the International Conference on Data Engineering. IEEE Computer Society Press, 266--275.
[23]
Har-Peled, S. and Mazumdar, S. 2004. Coresets for k-means and k-median clustering and their applications. In Proceedings of the Annual Symposium Theory of Computing. ACM, 291--300.
[24]
He, H., Xie, J., Yang, J., and Yu, H. 2005. Asymmetric batch incremental view maintenance. In Proceedings of the International Conference on Data Engineering, 106--117.
[25]
Hong, M., Demers, A. J., Gehrke, J., Koch, C., Riedewald, M., and White, W. M. 2007. Massively multi-query join processing in publish/subscribe systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 761--772.
[26]
Ives, Z., Florescu, D., Friedman, M., Levy, A., and Weld, D. 1999. An adaptive query execution system for data integration. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 299--310.
[27]
Jagadish, H. V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K., and Suel, T. 1998. Optimal histograms with quality guarantees. In Proceedings of the International Conference on Very Large Data Bases, 275--286.
[28]
Katz, M., Nielsen, F., and Segal, M. 2003. Maintenance of a piercing set for intervals with applications. Algorithmica 36, 1, 59--73.
[29]
Koudas, N., Muthukrishnan, S., and Srivastava, D. 2000. Optimal histograms for hierarchical range queries. In Proceedings of the ACM Symposium on Principles of Database Systems. ACM, 196--204.
[30]
Lim, H.-S., Lee, J.-G., Lee, M.-J., Whang, K.-Y., and Song, I.-Y. 2006. Continuous query processing in data streams using duality of data and queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 313--324.
[31]
Liu, L., Pu, C., and Tang, W. 1999. Continual queries for Internet scale event-driven information delivery. IEEE Trans. Knowl. Data Eng. 11, 4, 610--628.
[32]
Madden, S., Shah, M., Hellerstein, J., and Raman, V. 2002. Continuously adaptive continuous queries over streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 49--60.
[33]
Markl, V., Raman, V., Simmen, D., Lohman, G., and Pirahesh, H. 2004. Robust query processing through progressive optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 659--670.
[34]
McCreight, E. M. 1985. Priority search trees. SIAM J. Comput. 14, 257--276.
[35]
Pereira, J., Fabret, F., Jacobsen, H. A., Llirbat, F., and Shasha, D. 2001. Webfilter: A high-throughput XML-based publish and subscribe system. In Proceedings of the International Conference on Very Large Data Bases, 723--724.
[36]
Tarjan, R. E. 1983. Data Structures and Network Algorithms. Society for Industrial and Applied Mathematics, Philadelphia.
[37]
Vaishnavi, V. K. 1982. Computing point enclosures. IEEE Trans. Comput. 31, 1, 22--29.
[38]
Widom, J. and Ceri, S. 1996. Active Database Systems: Triggers and Rules for Advanced Database Processing. Morgan Kaufmann, San Fransisco.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 34, Issue 3
August 2009
269 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1567274
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2009
Accepted: 01 February 2009
Revised: 01 November 2008
Received: 01 January 2008
Published in TODS Volume 34, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Continuous queries
  2. data streams
  3. event matching
  4. publish/subscribe

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media