Abstract
The join operation, which combines tuples from multiple relations, is the most fundamental and, typically, the most expensive operation in database queries. The standard approach to join-query optimization is cost based, which requires developing a cost model, assigning an estimated cost to each query-processing plan, and searching in the space of all plans for a plan of minimal cost. Two other approaches can be found in the database-theory literature. The first approach, initially proposed by Chandra and Merlin, focused on minimizing the number of joins rather then on selecting an optimal join order. Unfortunately, this approach requires a homomorphism test, which itself is NP-complete, and has not been pursued in practical query processing. The second, more recent, approach focuses on structural properties of the query in order to find a project-join order that will minimize the size of intermediate results during query evaluation. For example, it is known that for Boolean project-join queries a project-join order can be found such that the arity of intermediate results is the treewidth of the join graph plus one.
In this paper we pursue the structural-optimization approach, motivated by its success in the context of constraint satisfaction. We chose a setup in which the cost-based approach is rather ineffective; we generate project-join queries with a large number of relations over databases with small relations. We show that a standard SQL planner (we use PostgreSQL) spends an exponential amount of time on generating plans for such queries, with rather dismal results in terms of performance. We then show how structural techniques, including projection pushing and join reordering, can yield exponential improvements in query execution time. Finally, we combine early projection and join reordering in an implementation of the bucket-elimination method from constraint satisfaction to obtain another exponential improvement.
Work supported in part by NSF grants CCR-9988322, CCR-0124077, CCR-0311326, IIS-9908435, IIS-9978135, and EIA-0086264.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abiteboul, S., Hull, R., Vianu, V.: Foundations of databases. Addison-Wesley, Reading (1995)
Aho, A., Sagiv, Y., Ullman, J.D.: Efficient optimization of a class of relational expressions. ACM Trans. on Database Systems 4, 435–454 (1979)
Aho, A., Sagiv, Y., Ullman, J.D.: Equivalence of relational expressions. SIAM Journal on Computing 8, 218–246 (1979)
Apers, P., Hevner, A., Yao, S.: Optimization algorithms for distributed queries. IEEE Trans. Software Engineering 9(1), 57–68 (1983)
Arnborg, S., Corneil, D.G., Proskurowski, A.: Complexity of finding embeddings in a k-tree. SIAM Journal of Algebraic and Discrete Methods 8(2), 277–284 (1987)
Bodlaender, H.L.: A tourist guide through treewidth. Acta Cybernetica 11, 1–21 (1993)
Bouquet, F.: Gestion de la dynamicité et énumération d’implicants premiers: une approche fondée sur les Diagrammes de Décision Binaire. PhD thesis, Université de Provence, France (1999)
Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational databases. In: Proc. 9th ACM Symp. on Theory of Computing, pp. 77–90 (1977)
Chauhan, P., Clarke, E.M., Jha, S., Kukula, J.H., Veith, H., Wang, D.: Using combinatorial optimization methods for quantification scheduling. In: Proc. 11th Conf. on Correct Hardware Design and Verification Methods, pp. 293–309 (2001)
Chekuri, C., Ramajaran, A.: Conjunctive query containment revisited. Technical report, Stanford University (November 1998)
Dalmau, V., Kolaitis, P.G., Vardi, M.Y.: Constraint satisfaction, bounded treewidth, and finite-variable logics. In: Van Hentenryck, P. (ed.) CP 2002. LNCS, vol. 2470, pp. 311–326. Springer, Heidelberg (2002)
Dechter, R.: Mini-buckets: A general scheme for generating approximations in automated reasoning. In: International Joint Conference on Artificial Intelligence, pp. 1297–1303 (1997)
Dechter, R.: Bucket elimination: a unifying framework for reasoning. Artificial Intelligence 113(1-2), 41–85 (1999)
Dechter, R.: Constraint Processing. Morgan Kaufmann, San Francisco (2003)
Dechter, R., Pearl, J.: Network-based heuristics for constraint-satisfaction problems. Artificial Intelligence 34, 1–38 (1987)
Downey, R.G., Fellows, M.R.: Parametrized Complexity. Springer, Heidelberg (1999)
Freuder, E.C.: Complexity of k-tree structured constraint satisfaction problems. In: Proc. AAAI 1990, pp. 4–9 (1990)
Freytag, J.C.: A rule-based view of query optimization. In: Proceedings of the 1987 ACM SIGMOD international conference on Management of data, pp. 173–180 (1987)
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database System Implementation. Prentice-Hall, Englewood Cliffs (2000)
Garey, M.R., Johnson, D.S.: Computers and Intractability, A Guide to the Theory of NP-Completeness. W. H. Freeman, New York (1979)
Gottlob, G., Leone, N., Scarcello, F.: Hypertree decompositions and tractable queries. In: Proc. 18th ACM Symp. on Principles of Database Systems, pp. 21–32 (1999)
Griffiths, P.P., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: ACM SIGMOD International Conference on Management of Data, pp. 23–34 (1979)
Halevy, A.: Answering queries using views: A survey. VLDB Journal, 270–294 (2001)
Hojati, R., Krishnan, S.C., Brayton, R.K.: Early quantification and partitioned transition relations. In: Proc. 1996 Int’l Conf. on Computer Design, pp. 12–19 (1996)
Ioannidis, Y., Wong, E.: Query optimization by simulated annealing. In: ACM SIGMOD International Conference on Management of Data, pp. 9–22 (1987)
Kolaitis, P.G., Vardi, M.Y.: Conjunctive-query containment and constraint satisfaction. Journal of Computer and System Sciences, 302–332 (2000); Earlier version in: Proc. 17th ACM Symp. on Principles of Database Systems (PODS 1998) (1998)
Kunen, I.K., Suciu, D.: A scalable algorithm for query minimization. Technical report, University of Washington (2002)
Ramakrishnan, R., Beeri, C., Krishnamurthi, R.: Optimizing existential datalog queries. In: Proceedings of the ACM Symposium on Principles of Database Systems, pp. 89–102 (1988)
Rish, I., Dechter, R.: Resolution versus search: Two strategies for SAT. Journal of Automated Reasoning 24(1/2), 225–275 (2000)
San Miguel Aguirre, A., Vardi, M.Y.: Random 3-SAT and BDDs – the plot thickens further. In: Walsh, T. (ed.) CP 2001. LNCS, vol. 2239, pp. 121–136. Springer, Heidelberg (2001)
Tarjan, R.E., Yannakakis, M.: Simple linear-time algorithms to tests chordality of graphs, tests acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM J. on Computing 13(3), 566–579 (1984)
Ullman, J.D.: Database and Knowledge-Base Systems, vol. I and II. Computer Science Press, Rockville (1989)
Vardi, M.Y.: On the complexity of bounded-variable queries. In: Proc. 14th ACM Symp. on Principles of Database Systems, pp. 266–276 (1995)
Wong, E., Youssefi, K.: Decomposition - a strategy for query processing. ACM Trans. on Database Systems 1(3), 223–241 (1976)
Yannakakis, M.: Algorithms for acyclic database schemes. In: Proc. 7 Int’l Conf. on Very Large Data Bases, pp. 82–94 (1981)
Yerneni, R., Li, C., Ullman, J.D., Garcia-Molina, H.: Optimizing large join queries in mediation systems. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 348–364. Springer, Heidelberg (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McMahan, B.J., Pan, G., Porter, P., Vardi, M.Y. (2004). Projection Pushing Revisited. In: Bertino, E., et al. Advances in Database Technology - EDBT 2004. EDBT 2004. Lecture Notes in Computer Science, vol 2992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24741-8_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-24741-8_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21200-3
Online ISBN: 978-3-540-24741-8
eBook Packages: Springer Book Archive