Progressive Query Optimization for Federated Queries

Ewen, Stephan; Kache, Holger; Markl, Volker; Raman, Vijayshankar

doi:10.1007/11687238_50

Stephan Ewen²⁵,
Holger Kache²⁶,
Volker Markl²⁷ &
…
Vijayshankar Raman²⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3896))

Included in the following conference series:

International Conference on Extending Database Technology

1695 Accesses
6 Citations

Abstract

Database Management Systems (DBMS) perform query plan selection by mathematically modeling the execution cost of candidate execution plans and choosing the cheapest query execution plan (QEP) according to that cost model. The cost model requires accurate estimates of the sizes of intermediate results of all steps in the QEP. Outdated or incomplete statistics, parameter markers and complex skewed data frequently cause the selection of a suboptimal query plan, which in turn results in bad query performance. Federated queries are regular relational queries accessing data on one or more remote relational or non-relational data sources, possibly combining them with tables stored in the federated DBMS server. Their execution is typically divided between the federated server and the remote data sources. Outdated and incomplete statistics have a bigger impact on federated DBMS than on regular DBMS, as maintenance of federated statistics is unequally more complicated and expensive than the maintenance of the local statistics; consequently bad performance commonly occurs for federated queries due to the selection of a suboptimal query plan. We present an extension of the mid-query reoptimization technique "Progressive Query Optimization" (POP), which adds robustness to query processing by dynamically detecting if an access plan is suboptimal and by triggering a reoptimization in that case. Our extensions enable efficient reoptimization of federated queries. Our contributions are (a) an opportunistic, but risk controlled, reoptimization technique for federated DBMS (b) a technique for multiple reoptimizations during federated query processing, with a strategy to discover redundant and eliminate partial results and (c) a mechanism to eagerly procure statistics in a federated environment. We have implemented these techniques in a prototype version of WebSphere Information Integrator for DB2. Our enhancements enable robust and acceptable performance for federated queries, even if the remote data sources provided almost no statistical information about the data. An extensive case study on real world data shows POP has negligible runtime overhead and improves the performance of complex federated queries by up to a full order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahad, R., Rao, K.V.B., McLeod, D.: On Estimating the Cardinality of the Projection of a Database Relation. In: Proc. TODS (1989)
Google Scholar
Antoshenkov, G., Ziauddin, M.: Query Processing and Optimization in Oracle Rdb. VLDB Journal 5 (1996)
Google Scholar
Avnur, R., Hellerstein, J.M.: Eddies: Continuously Adaptive Query Processing. In: Proc. ACM SIGMOD (2000)
Google Scholar
Babu, S., Bizarro, P., DeWitt, D.: Proactive Re-Optimization. In: Proc. ACM SIGMOD (2005)
Google Scholar
Christodoulakis, S.: Implications of Certain Assumptions in Database Performance Evaluation. In: Proc. ACM Trans. on Database Systems (1984)
Google Scholar
Du, W., Krishnamurthy, R., Shan, M.-C.: Query optimization in heterogeneous DBMS. In: VLDB (1992)
Google Scholar
Gardarin, G., Sha, F., Tang, Z.-H.: Calibrating the query optimizer cost model of IRO-DB, an object-oriented federated database system. In: VLDB (1996)
Google Scholar
Gassner, P., Lohman, G.M., Schiefer, K.B., Wang, Y.: Query Optimization in the IBM DB2 Family. IEEE Data Engineering Bulletin (1994)
Google Scholar
Van Gelder, A.: Multiple Join Size Estimation by Virtual Domains. In: Proc. PODS (1993)
Google Scholar
Graefe, G., Ward, K.: Dynamic query evaluation plans. In: Proc. ACM SIGMOD (1989)
Google Scholar
Haas, L.M., Kossmann, D., Wimmers, E.L., Yang, J.: Optimizing Queries across Diverse Data Sources. In: Proc. VLDB 1997 (1997)
Google Scholar
Ives, Z., Halevy, A., Weld, D.: Adapting to Source Properties in Processing Data Integration Queries. In: Proc. ACM SIGMOD (2004)
Google Scholar
Kabra, N., DeWitt, D.: Efficient Mid-Query Re-Optimization of Suboptimal Query Execution Plans. In: Proc. ACM SIGMOD (1998)
Google Scholar
Markl, V., Megiddo, N., Kutsch, M., Tran, T.M., Haas, P., Srivastava, U.: Consistently Estimating the Selectivity of Conjuncts of Predicates. In: Proc. VLDB (2005)
Google Scholar
Markl, V., Raman, V., Simmen, D., Lohman, G., Pirahesh, H., Cilimdzic, M.: Robust Query Processing through Progressive Optimization. In: Proc. ACM SIGMOD (2004)
Google Scholar
Raman, V., Hellerstein, J.: Partial Results for Online Query Processing. In: Proc. ACM SIGMOD (2002)
Google Scholar
Raman, V., Deshpande, A., Hellerstein, J.: Using State Modules for Adaptive Query Processing. In: ICDE (2003)
Google Scholar
Selinger, P.G., Astrahan, M.M., Chamberlain, D.D., Lorie, R.A., Price, T.G.: Access Path Selection in a Relational Database. In: Proc. ACM SIGMOD (1979)
Google Scholar
Stillger, M., Lohman, G., Markl, V., Kandil, M.: LEO: DB2’s Learning Optimizer. In: Proc. VLDB (2001)
Google Scholar
Stonebraker, M., Aoki, P.M., Devine, R., Litwin, W., Olson, M.: Mariposa: A New Architecture for Distributed Data, ICDE 1994. Also Sequoia 2000 TR 93/31, UC Berkeley (1993)
Google Scholar
Swami, A.N., Schiefer, K.B.: On the Estimation of Join Result Sizes. In: Proc. EDBT (1994)
Google Scholar
Urhan, T., Franklin, M.J., Amsaleg, L.: Cost Based Query Scrambling for Initial Delays. In: Proc. ACM SIGMOD (1998)
Google Scholar
Zaharioudakis, M., Cochrane, R., Lapis, G., Pirahesh, H., Urata, M.: Answering Complex SQL Queries Using Automatic Summary Tables. In: Proc. ACM SIGMOD (2000)
Google Scholar
Q. Zhu, P. Larson - Solving local cost estimation for global query optimization in multidatabase systems, Distributed and Parallel Databases, 6:1–51, 1998
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Germany, Am Fichtenberg 1, 71083, Herrenberg, Germany
Stephan Ewen
IBM Silicon Valley Laboratory, 555 Bailey Avenue, San José, CA, USA
Holger Kache
IBM Almaden Research Center, 650 Harry Road, San José, CA, USA
Volker Markl & Vijayshankar Raman

Authors

Stephan Ewen
View author publications
You can also search for this author in PubMed Google Scholar
Holger Kache
View author publications
You can also search for this author in PubMed Google Scholar
Volker Markl
View author publications
You can also search for this author in PubMed Google Scholar
Vijayshankar Raman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Athens, Greece
Yannis Ioannidis
University of Konstanz, P.O.Box D188, 78457, Konstanz, Germany
Marc H. Scholl
Sustainable Content Logistics Centre, Hamburg, Germany
Joachim W. Schmidt
Chair of Software Engineering for Business Information Systems, Technische Universität München, Boltzmannstraße 3, 85748, Garching b. München,
Florian Matthes
Department of Informatics, University of Athens Panepistimiopolis, 15771, Athens, Greece
Mike Hatzopoulos
IPD, Universität Karlsruhe, Am Fasanengarten 5, 76131, Karlsruhe,
Klemens Boehm
TU München, D-85748, Garching, Germany
Alfons Kemper
Technische Universität München, Germany
Torsten Grust
Institute for Computer Science, Ludwig-Maximilians Universität München,
Christian Boehm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ewen, S., Kache, H., Markl, V., Raman, V. (2006). Progressive Query Optimization for Federated Queries. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_50

Download citation

DOI: https://doi.org/10.1007/11687238_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32960-2
Online ISBN: 978-3-540-32961-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics