ABSTRACT
Modern query optimizers select an efficient join ordering for a physical execution plan based essentially on the average join selectivity factors among the referenced tables. In this paper, we argue that this "monolithic" approach can miss important opportunities for the effective optimization of relational queries. We propose selectivity-based partitioning, a novel optimization paradigm that takes into account the join correlations among relation fragments in order to essentially enable multiple (and more effective) join orders for the evaluation of a single query. In a nutshell, the basic idea is to carefully partition a relation according to the selectivities of the join operations, and subsequently rewrite the query as a union of constituent queries over the computed partitions. We provide a formal definition of the related optimization problem and derive properties that characterize the set of optimal solutions. Based on our analysis, we develop a heuristic algorithm for computing efficiently an effective partitioning of the input query. Results from a preliminary experimental study verify the effectiveness of the proposed approach and demonstrate its potential as an effective optimization technique.
- S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom. Adaptive ordering of pipelined stream filters. In ACM SIGMOD, 2004. Google ScholarDigital Library
- K. Chakrabarti, M. Garofalakis, R. Rastogi, and K. Shim. Approximate Query Processing Using Wavelets. In VLDB, 2000. Google ScholarDigital Library
- S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. In VLDB, 1996. Google ScholarDigital Library
- A.Dobra, M.Garofalakis, J.Gehrke, and R. Rastohi. Processing complex aggregate queries over data streams. In ACM SIGMOD, 2002. Google ScholarDigital Library
- S. Cluet and G. Moerkotte. On the complexity of generating optimal left-deep processing trees with cross products. In ICDT, 1995. Google ScholarDigital Library
- A. Deshpande and J. M. Hellerstein. Lifting the burden of history from adaptive query processing. In VLDB, 2004. Google ScholarDigital Library
- D. J. DeWitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Muralikrishna. GAMMA - A High Performance Dataflow Database Machine. In VLDB, 1986. Google ScholarDigital Library
- D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. CACM, 35(6), 1992. Google ScholarDigital Library
- G. Graefe and D. J. DeWitt. The exodus optimizer generator. In ACM SIGMOD, 1987. Google ScholarDigital Library
- A. Halevy. Answering queries using views: A survey. Intl. Journal on Very Large Data Bases, 10(4), 2001. Google ScholarDigital Library
- K. A. Hua and C.Lee. An adaptive data placement scheme for parallel database computer systems. In VLDB, 1990. Google ScholarDigital Library
- T. Ibaraki and T. Kameda. On the optimal nesting order for computing n-relational joins. ACM Transactions on Database Systems, 9(3):482--502, 1984. Google ScholarDigital Library
- Y. E. Ioannidis and Y. C. Kang. Randomized algorithms for optimizing large join queries. In ACM SIGMOD, 1990. Google ScholarDigital Library
- Y. E. Ioannidis and V. Poosala. Histogram-Based Approximation of Set-Valued Query Answers. In VLDB, 1999. Google ScholarDigital Library
- Y. E. Ioannidis. "Universality of Serial Histograms". In VLDB, 1993. Google ScholarDigital Library
- N. Kabra and D. J. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. In SIGMOD, 1998. Google ScholarDigital Library
- R. Krishnamurthy, B. Boral, and C. Zaniolo. Optimization of nonrecursive queries. In VLDB, 1986. Google ScholarDigital Library
- H. Pirahesh, J. M. Hellerstein, and W. Hasan. Extensible/rule based query rewrite optimization in starburst. In ACM SIGMOD, 1992. Google ScholarDigital Library
- P. G. Selinger, M. M. Astrahan, R. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In ACM SIGMOD, 1979. Google ScholarDigital Library
- T. K. Sellis. Multiple-query optimization. ACM TODS, 13(1), 1988. Google ScholarDigital Library
Index Terms
- Selectivity-based partitioning: a divide-and-union paradigm for effective query optimization
Recommendations
Graph-based synopses for relational selectivity estimation
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataThis paper introduces the Tuple Graph (TUG) synopses, a new class of data summaries that enable accurate selectivity estimates for complex relational queries. The proposed summarization framework adopts a "semi-structured" view of the relational ...
Multi-way spatial join selectivity for the ring join graph
Efficient spatial query processing is very important since the applications of the spatial DBMS (e.g. GIS, CAD/CAM, LBS) handle massive amount of data and consume much time. Many spatial queries contain the multi-way spatial join due to the fact that ...
Improved selectivity estimator for XML queries based on structural synopsis
With the increasing popularity of XML database applications, the use of efficient XML query optimizers is becoming very essential. The performance of an XML query optimizer depends heavily on the query selectivity estimators it uses to find the best ...
Comments