ABSTRACT
Most relational query optimizers make use of information about the costs of accessing tuples and data structures on various storage devices. This information can at times be off by several orders of magnitude due to human error in configuration setup, sudden changes in load, or hardware failure. In this paper, we attempt to answer the following questions:• Are inaccurate access cost estimates likely to cause a typical query optimizer to choose a suboptimal query plan?• If an optimizer chooses a suboptimal plan as a result of inaccurate access cost estimates, how far from optimal is this plan likely to be?To address these issues, we provide a theoretical, vector-based framework for analyzing the costs of query plans under various storage parameter costs. We then use this geometric framework to characterize experimentally a commercial query optimizer. We develop algorithms for extracting detailed information about query plans through narrow optimizer interfaces, and we perform the characterization using database statistics from a published run of the TPC-H benchmark and a wide range of storage parameters.We show that, when data structures such as tables, indexes, and sorted runs reside on different storage devices, the optimizer can derive significant benefits from having accurate and timely information regarding the cost of accessing storage devices.
- A. Brown and D. A. Patterson. Towards avilability benchmarks: A case study of software RAID systems. In Proceedings of the USENIX Technical Conference, pages 263--276, 2000. Google ScholarDigital Library
- S. Chaudhuri and V. Narasayya. Automating statistics management for query optimizers. IEEE Transactions on Knowledge and Data Engineering, 13:7--20, 2001. Google ScholarDigital Library
- C.-M. Chen and N. Roussopoulos. Adaptive selectivity estimation using query feedback. In R. T. Snodgrass and M. Winslett, editors, Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 24--27, 1994, pages 161--172. ACM Press, 1994. Google ScholarDigital Library
- F. Chu, J. Halpern, and J. Gehrke. Least expected cost query optimization: What can we expect? In Proceedings of the ACM Symposium on Principles of Database Systems, pages 293--302, 2002. Google ScholarDigital Library
- F. Chu, J. Halpern, and P. Seshadri. Least expected cost query optimization: An exercise in utility. In Proceedings of the ACM Symposium on Principles of Database Systems, pages 138--147, 1999. Google ScholarDigital Library
- R. L. Cole and G. Graefe. Optimization of dynamic query evaluation plans. In Proceedings of the ACM SIGMOD Conference, pages 150--160, 1994. Google ScholarDigital Library
- I. B. M. Corporation. IBM DB2 Universal Database Command Reference. International Business Machines Corporation, 2001. http://www.elink.ibmlink.ibm.com/public/applications/publications/cgibin/pbi.cgi.Google Scholar
- I. B. M. Corporation. Autonomic computing: IBM's perspective on the state of information technology. Technical report, 2002. http://www.research.ibm.com/autonomic/manifesto/autonomic_computing.pdf.Google Scholar
- I. B. M. Corporation. DB2 universal database version 8.1 for Linux, 2002.Google Scholar
- I. B. M. Corporation. TPC benchmark H full disclosure report: IBM eServer xSeries 350 using IBM DB2 Universal Database 7.2. Technical report, Transaction Processing Performance Council, 2002. http://www.tpc.org/results/ FDR/tpch/x350_100GB_16proc_FDR.pdf.Google Scholar
- T. P. P. Council. dbgen. http://www.tpc.org/tpch/spec/20000511.tar.z.Google Scholar
- T. P. P. Council. TPC Benchmark H. Technical report, Transaction Processing Performance Council, 2002. http://www.tpc.org/tpch/spec/tpch150.pdf.Google Scholar
- A. Deshpande and J. Hellerstein. Decoupled query optimization for federated database systems. In ICDE, 2002.Google ScholarCross Ref
- S. Ganguly. Design and analysis of parametric query optimization algorithms. In Proceedings of the Very Large Database Conference, pages 228--238, 1998. Google ScholarDigital Library
- S. Ganguly, W. Hasan, and R. Krishnamurthy. Query optimization for parallel execution. In Proceedings of the ACM SIGMOD Conference, pages 9--18, 1994. Google ScholarDigital Library
- P. Gassner, G. M. Lohman, K. B. Schiefer, and Y. Wang. Query optimization in the IBM DB2 family. Data Engineering Bulletin, 16:4--18, 1993.Google Scholar
- G. Golub and C. van Loan. Matrix Computations. Johns Hopkins Press, 1983.Google Scholar
- G. Graefe. The cascades framework for query optimization. Data Engineering Bulletin, 18:19--29, 1995.Google Scholar
- A. Hulgeri and S. Sudarshan. Parameteric query optimization. In Proceedings of the Very Large Database Conference, pages 167--178, 2002. Google ScholarDigital Library
- Y. E. Ioannidis. Query optimization. ACM Computing Surveys, 28:121--123, 1996. Google ScholarDigital Library
- Y. E. Ioannidis, R. T. Ng, K. Shim, and T. K. Sellis. Parametric query processing. In Proceedings of the Very Large Database Conference, pages 103--114, 1992. Google ScholarDigital Library
- M. Jarke and J. Koch. Query optimization in database systems. ACM Computing Surveys, 16:111--152, 1984. Google ScholarDigital Library
- H. Lu, K.-L. Tan, and S. Dao. The fittest survives: An adaptive approach to query optimization. In Proceedings of the Very Large Database Conference, pages 251--262, 1995. Google ScholarDigital Library
- D. A. Patterson et al. Recovery-Oriented Computing (ROC): Motivation, definition, techniques, and case studies. Technical Report UCB/CSD-02-1175, UC Berkeley Computer Science Department, 2002. Google ScholarDigital Library
- V. G. V. Prasad. Parametric query optimization: A geometric approach. Master's thesis, Indian Institute of Technology, Kanpur, India, 1999.Google Scholar
- F. Preparata and M. Shamos. Computational Geometry: an Introduction. Springer-Verlag, 1988. Google ScholarDigital Library
- W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C. Cambridge University Press, New York, NY, 1988. Google ScholarDigital Library
- S. V. U. M. Rao. Parametric query optimizationi: A non-geometric approach. Master's thesis, Indian Institute of Technology, Kanpur, India, 1999.Google Scholar
- C. Ruemmler and J. Wilkes. An introduction to disk drive modeling. IEEE Computer, 27:17--28, 1994. Google ScholarDigital Library
- P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proceedings of the ACM SIGMOD Conference, pages 23--34. ACM, 1979. Google ScholarDigital Library
- T. Shirai, L. Dilworth, R. Reutlinger, S. Kumar, D. Bernabe, B. Wilkins, and B. Cassells. DB2 UDB V7.1 Performance Tuning Guide. IBM International Technical Support Organization, December 2000.Google Scholar
- M. Steinbrunn, G. Moerkotte, and A. Kemper. Heuristic and randomized optimization for the join ordering problem. VLDB Journal, 6:191--208, 1997. Google ScholarDigital Library
- M. Stillger and J. C. Freytag. Testing the quality of a query optimizer. Data Engineering Bulletin, 18:41--48, 1995.Google Scholar
- M. Stillger, G. Lohman, V. Markl, and M. Kandil. LEO --- DB2's learning optimizer. In Proceedings of the Very Large Database Conference, pages 19--28, 2001. Google ScholarDigital Library
- G. Strang. Linear Algebra and its Applications. Harcourt, Brace and Jovanovich, Inc., Orlando, FL, 1988.Google Scholar
- G. Weikum, A. Moenkeberg, C. Hasse, and P. Zabback. Self-tuning database technology and information services: from wishful thinking to viable engineering. In Proceedings of the Very Large Database Conference, pages 20--31, 2002. Google ScholarDigital Library
- B. L. Worthington, G. R. Ganger, Y. N. Patt, and J. Wilkes. On-line extraction of scsi disk drive parameters. In ACM SIGMETRICS Performance Evaluation Review, pages 146--156, 1995. Google ScholarDigital Library
Index Terms
- A characterization of the sensitivity of query optimization to storage access cost parameters
Recommendations
Query optimization in multidatabase systems
CASCON '92: Proceedings of the 1992 conference of the Centre for Advanced Studies on Collaborative research - Volume 2A multidatabase system (MDBS) integrates information from autonomous local databases managed by heterogeneous database management systems (DBMS) in a distributed environment. For a query involving more than one database, global query optimization should ...
Query containment under bag and bag-set semantics
Conjunctive queries (CQs) are at the core of query languages encountered in many logic-based research fields such as AI, or database systems. The majority of existing work assumes set semantics but often in real applications the manipulation of ...
Multi-objective parametric query optimization
Classical query optimization compares query plans according to one cost metric and associates each plan with a constant cost value. In this paper, we introduce the multi-objective parametric query optimization (MPQO) problem where query plans are ...
Comments