ABSTRACT
Query optimizers play a critical role in the success of every relational database system. However, regression testing for optimizers remains an ad hoc, tedious, and time consuming process. Typically, a large number of SQL query suites are employed for regression testing. These suites are manually designed at great cost by development and QA groups or are collected from various customers or benchmarks such as TPC-H or TPC-DS. While these suites are useful in capturing regressions, optimizers continue to be plagued by regressions and bug fixing requiring expensive human intervention. This may be because these ad-hoc regression queries are redundant in the sense that they are not covering different parts of the optimizer plan space. This paper introduces a novel way in which the optimizer itself is used to generate an economical regression suite. Our approach eliminates the tedium in manually designing a regression suite and removes redundancy in the suite. As a first step towards solving this very difficult problem, we shall focus on the join plan space in this paper with a small number of tables. We show that our generated queries exhibit 50% more distinct join plans than TPC-H and TPC-DS combined. The generated queries have also been very useful for validating the optimizer's cost functions and hence can be used as a test suite as well. Since this is a new approach, we will highlight some of the areas that need a closer look by the research community.
- Bati, H., Giakoumakis, L., Herbert, S., and Surna, A. A genetic approach for random testing of database systems. VLDB 2007: 1243--1251 Google ScholarDigital Library
- DeWitt D. J., Naughton, J. F., Schneider, D. A., and Seshadri, S. Practical Skew Handling in Parallel Joins. VLDB 1992: 27--40 Google ScholarDigital Library
- Graefe G. The Cascades Framework for Query Optimization. IEEE Data Eng. Bull. 18(3): 19--29 (1995)Google Scholar
- Haftmann, F., Kossmann D., and Lo, E. A framework for efficient regression tests on database applications. VLDB J. 16(1): 145--164 (2007) Google ScholarDigital Library
- http://h71028.www7.hp.com/enterprise/us/en/software/bi-product-neoview.htmlGoogle Scholar
- Muralikrishna, M. Improved Unnesting Algorithms for Join Aggregate SQL Queries. VLDB 1992: 91--102 Google ScholarDigital Library
- Othayoth R. and Poess, M. The Making of TPC-DS, VLDB 2006: 1049--1058 Google ScholarDigital Library
- Poess M. and Stephens, M. J. Generating Thousand Benchmark Queries in Seconds. VLDB 2004: 1045--1053 Google ScholarDigital Library
- Poosala V., Ioannidis Y. E., Haas P. J., and Shekita E. J. Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294--305 Google ScholarDigital Library
- Pöss M., http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/n/Nambiar:Raghunath_Othayoth.html Othayoth R. N., and Walrath D. Why You Should Run TPC-DS: A Workload Analysis. VLDB 2007: 1138--1149 Google ScholarDigital Library
- Reddy N. and Haritsa, J. R. Analyzing Plan Diagrams of Database Query Optimizers. VLDB 2005: 1228--1240 Google ScholarDigital Library
- Schneider D. A. and DeWitt, D. J. Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines. VLDB 1990: 469--480 Google ScholarDigital Library
- Slutz, D. R. Massive Stochastic Testing of SQL. VLDB 1998: 618--622 Google ScholarDigital Library
- http://tpc.org/tpch/spec/tpch2.7.0.pdfGoogle Scholar
- Waas F. and Galindo-Legaria, C. A. Counting, Enumerating, and Sampling of Execution Plans in a Cost-Based Query Optimizer. SIGMOD Conference 2000: 499--509 Google ScholarDigital Library
- Willmor D. and Embury, S. M. A Safe Regression Test Selection Technique for Database-Driven Apps. ICSM 2005: 421--434 Google ScholarDigital Library
Index Terms
- Using the optimizer to generate an effective regression suite: a first step
Recommendations
Performance Improvement for Collection Operations Using Join Query Optimization
COMPSAC '11: Proceedings of the 2011 IEEE 35th Annual Computer Software and Applications ConferenceProgramming languages with explicit support for queries over collections allow programmers to express operations on collections more abstractly than relying on their realization in loops or through provided libraries. Join optimization techniques from ...
Exploring optimization and caching for efficient collection operations
Many large programs operate on collection types. Extensive libraries are available in many programming languages, such as the C++ Standard Template Library, which make programming with collections convenient. Extending programming languages to provide ...
Exploring Query Optimization in Programming Codes by Reducing Run-Time Execution
COMPSAC '10: Proceedings of the 2010 IEEE 34th Annual Computer Software and Applications ConferenceObject querying is an abstraction of operations over collections, whereas manual implementations are performed at low level which forces the developers to specify how a task must be done. Some object-oriented languages allow the programmers to express ...
Comments